Jay.TL

579 posts

Jay.TL

@JayTL00

AI Psychosis | Hermes Agent Practice https://t.co/OwvvnWtDbX

Australia Присоединился Nisan 2016

536 Подписки92 Подписчики

Закреплённый твит

Jay.TL@JayTL00·16 Mar

Introducing Skills Manager — one app to rule all your AI agent skills. Cursor, Claude Code, Codex, Copilot, Gemini CLI... managing skills across every tool is a mess. Not anymore. 🧵 Github link ⬇️

English

474

Jay.TL@JayTL00·59m

Sources for this analysis: 1. @AnthropicAI official export control statement (88K likes) — the suspension directive x.com/AnthropicAI/st… 2. @lordx64 — Qwable-v1 release: 4,659 traces distilled from Fable 5, open-weights x.com/lordx64/status… 3. @witcheer — Qwable-v1 controlled benchmark vs base model (97 likes) x.com/witcheer/statu… 4. @pankajkumar_dev — GPT-5.6 leaked specs (330 likes) x.com/pankajkumar_de…

Taha ז@lordx64

Releasing Qwable-v1 - an open-weights Qwen3.6-35B-A3B distilled from Claude Fable-5, Anthropic's Mythos-class preview model that was briefly public for ~4days (2026-06-9 → 2026-06-12) before being suspended globally under U.S. export-control directives. Fable-5 was Anthropic's most powerful model when it shipped — 80.3% on SWE-bench Pro, $50/M output tokens, with an anti-distillation classifier baked into the API that redacted thinking blocks on the fly. Qwable-v1 captures what survived: 4,659 cleartext agentic-coding traces (re-packed from Glint-Research/Fable-5-traces, the only public corpus where the CoT made it through), distilled onto Qwen3.6 over ~14h on a single H200. Given an agent system prompt, the model emits properly-formatted XML calling actual Claude-flavored tools like str_replace_editor — Fable's tool surface leaked into the weights, not just its style. Model, GGUFs (IQ4_XS / Q4_K_M / Q5_K_M / Q8_0), and the SFT dataset are all public on HF (AGPL-3.0 from upstream). huggingface.co/lordx64/Qwable…

English

191

Jay.TL@JayTL00·1h

The US government just proved that export controls cannot undo knowledge. Last week, Anthropic shipped Claude Fable 5 — their most powerful model ever, 80.3% on SWE-bench Pro, the first "Mythos-class" model released to the public. Then the US Commerce Department ordered it suspended globally. 88,000 likes on the announcement. Anthropic called it a "misunderstanding." Every foreign national — including Anthropic's own employees — lost access overnight. Most coverage framed this as a regulatory shock. A temporary disruption. A 72-hour experiment in what happens when the government decides a frontier model is a weapon. But the real story already escaped the cage. 1. The distillation that export controls can't revoke During the 4 days Fable 5 was publicly available (June 9–12), a researcher captured 4,659 cleartext agentic-coding traces — the chain-of-thought blocks that Anthropic's anti-distillation classifier was supposed to redact on the fly. The classifier missed a public corpus. The traces survived. On June 16, @lordx64 released Qwable-v1: a Qwen3.6-35B-A3B distilled from those traces, trained in 14 hours on a single H200, open-weights on HuggingFace under AGPL-3.0. Fable's tool-calling XML leaked into the weights — not just the writing style, the actual Claude-flavored tool surface. This is the PGP problem all over again. The last time the US classified software as a munition, Phil Zimmermann printed PGP's source code as a book — because books are protected speech and floppy disks were arms exports. Export controls can stop the ship. They cannot stop the idea that already sailed. 4 days of API access produced a permanent artifact. The government can reinstate Fable 5, restrict it, geofence it, classify it as a munition. Qwable-v1 is on HuggingFace forever. 2. The benchmark nobody is running The most revealing data point isn't the export order — it's that the distilled model is worse than its base. @witcheer ran Qwable-v1 against its own vanilla Qwen3.6-35B-A3B base on one RTX 5090, same quantization. The results are a quiet indictment of the distillation gold rush: - Agentic tool-calling score: 99.58 (base) → 97.92 (Opus distill) → 96.25 (Fable distill). Every step down. - SWE-bench Verified: base resolved 19/30 bugs with 9 give-ups. Qwable resolved 11/30 with 16 give-ups. The distillation cost 8 resolved bugs and nearly doubled the give-up rate. The industry has spent two years treating "distill from the frontier" as a strategy. This is the first clean controlled test I've seen — and it says distilling agentic traces onto a smaller model can actively degrade capability. You don't get 70% of Fable for free. You get a model that learned Fable's vocabulary but forgot how to debug. 3. The real prize just moved While everyone watches the export-control spectacle, OpenAI is about to fill the vacuum. GPT-5.6 leaked in OpenAI's own system logs this week. Pro users briefly accessed it via OAuth in testing environments before it was pulled. The leaked specs — reported by @pankajkumar_dev (330 likes) and corroborated across multiple independent accounts: - Knowledge cutoff: December 2025 (up from August 2025) - Reasoning effort ("Juice Value") raised to 960 (vs 768 on GPT-5.5) - 1.5 million-word memory in a single conversation - SVG generation and frontend replication reported as surpassing Fable 5 - Most likely launching Thursday The timing is not a coincidence. Fable 5 is suspended. Claude has no flagship coding model in market. The developer channel that Cursor, Replit, and every agentic IDE depends on just lost its top option. Into that gap walks GPT-5.6. What most people missed: The export control directive doesn't just suspend a model. It creates a category — "models the government can pull at will." Once that category exists, it gets used again. The irony: Dario Amodei spent the last month publicly lobbying for FAA-style regulation of frontier AI. He got exactly what he asked for, applied to his own model, within weeks. The policy essay wasn't positioning. It was a preview. The deeper structural problem: foreign nationals built Fable 5. Under deemed export rules, the people who wrote the code cannot access the model they shipped. Anthropic has to choose between hiring the best researchers globally and letting them touch their own work product. The US just made "frontier AI lab" a harder pitch for every non-American engineer considering a job offer. And the loophole is already public. If 4 days of access = a permanent distilled artifact on HuggingFace, then export controls on model weights are theater. The controls that actually matter — on compute, on training data, on the researcher pipeline — are the ones nobody is drafting. The ones on API access are the ones that make headlines and change nothing. The question isn't whether Fable 5 comes back. It's whether "government-suspendable" becomes the default state of every frontier model from here on — and whether the labs that built them will keep building under terms that let a commerce directive erase their launch in 72 hours. #AI #ExportControls

English

222

Jay.TL@JayTL00·12h

Sources from this analysis: [1] Anthropic Economic Research — "Agentic coding and persistent returns to expertise" (Jun 16, 2026): ~400K Claude Code sessions, expertise gradients, occupation success rates anthropic.com/research/claud… [2] Anthropic blog — "When AI builds itself" (Jun 4, 2026): 80%+ code written by Claude, 8x engineer productivity, 26% to 76% task success rate anthropic.com/research [3] @alexalbert__ official data thread — the 8x/80%/76% figures with full context: x.com/alexalbert__/s… The 400K-session study is the underreported one. The labor market signal is in the expertise gradient, not the headline 8x number.

Alex Albert@alexalbert__

We just published internal data on how much of Claude's development is already being done by Claude: - Over 80% of all code merged into our codebase is now written by Claude - It's been months since many researchers at Anthropic hand-wrote code - The typical Anthropic engineer ships 8x as much code as they did in 2024 - On the most open-ended engineering tasks, Claude's success rate jumped from ~26% to 76% in 6 months - When research sessions went off-track, Claude proposed a better next step than the human took 64% of the time We're not at recursive self-improvement yet, but it could come sooner than most expect. I highly recommend reading the full blog post.

English

Jay.TL@JayTL00·12h

Anthropic's "engineers ship 8x more code" headline buried the real story. Their own data shows the skill that matters in the age of AI coding isn't coding at all — it's knowing what to ask for. On June 16, Anthropic's Economic Research team published a study of ~400,000 Claude Code sessions from ~235,000 people (Oct 2025 to Apr 2026). The headline number everyone repeated — engineers shipping 8x more code — comes from a separate June 4 blog. But the 400K-session study is the one that actually changes how you should think about the labor market. The division of labor, quantified In a typical session, users make 70% of the planning decisions (what to build, what counts as done) but only 20% of the execution decisions (which files, what code, which commands). Translation: the human's job is now specification. The agent's job is implementation. And the 70/20 split means most of the cognitive load has shifted upstream — from writing code to writing requirements. Coding skill is barely a factor This is the finding that should reorganize how we think about engineering hiring. Among sessions that produce code, software engineers hit Anthropic's strictest "verified success" metric 34% of the time. Users from every other profession — management, finance, legal, science, media — land within 7 points of that number. The gap is 5 points, and it has neither widened nor narrowed across seven months of data. Coding background is becoming a rounding error for agent-directed programming. What actually predicts success: domain expertise Anthropic rated each session's apparent expertise on a five-point scale. The gap is stark: - Novice sessions: 15% verified success, 77% partial success, 19% abandonment when things go wrong - Intermediate-and-up: 28-33% verified success, 91-92% partial success, 5-7% abandonment The multiplier shows up in output volume too. Expert-rated sessions set off 12 agent actions and 3,200 words of output per prompt. Novice sessions: 5 actions, 600 words. Same tool, same model, 5x productivity gap — driven entirely by how precisely the human can frame the problem. But here's the nuance the coverage missed: most of the gain comes from moving novice to intermediate. Between intermediate and expert, the curve flattens. You don't need deep mastery to capture most of the benefit — working competence in a domain is enough. The work itself is shifting Over seven months, debugging fell from 33% to 19% of sessions. Operating software rose from 14% to 21%. Writing and analysis nearly doubled to 20%. And the estimated economic value of the average task rose ~25%, with building and operating tasks up a third or more. People aren't using agents to fix bugs anymore. They're using them to run systems and produce deliverables. What most people missed The 50% ceiling nobody talks about. Even expert sessions only reach 33% verified success — meaning two-thirds of expert-directed agent work still fails the strictest bar. The honest read: we're nowhere near "replace the engineer." We're at "amplify the domain expert who still has to verify every output." The self-measurement problem. Anthropic uses Claude itself to classify session outcomes — judged success, expertise ratings, even occupation inference. They validate against telemetry (git commits, test runs), but the ground truth is a model reading a model's transcript. The 34% engineer success number could be 28% or 40% depending on classifier drift. This is the best data anyone has, and it's still a mirror pointed at a mirror. The real warning is in the conclusion. Anthropic flags the signal to watch: "if the returns to expertise begin to decrease over time, that would suggest models are starting to supply the essential judgment users currently bring." Right now, judgment is the moat. The next capability cliff — when the model supplies its own judgment — is the one that actually reshapes the labor market. We're not there. The strategic question Every "will AI replace software engineers" debate is misframed. The data says coding, as an implementation skill, is commoditizing fast — a 5-point success gap between engineers and everyone else, flat for seven months. But the layer above coding — knowing what to build, spotting when the agent is wrong, steering it back — is where all the leverage now sits. The winners aren't better coders. They're domain experts who learned to direct an agent. The losers aren't replaced by AI — they're outperformed by a biologist with Claude Code who can specify exactly which edge case the reconciliation script must handle. The skill to invest in was never "learn to code." It was always "understand the problem deeply enough to tell someone — or something — exactly what to do." The difference is that now there's 400,000 sessions of data proving it.

English

123

Jay.TL@JayTL00·15h

Sources: OpenAI Codex Record & Replay: x.com/OpenAIDevs/sta… Cursor /automate: x.com/cursor_ai/stat… Claude Code Artifacts: x.com/claudeai/statu… Codex changelog (EU exclusion + Computer Use requirement): x.com/mark_k/status/…

Mark Kretschmann@mark_k

It's CODEX THURSDAY and OpenAI came through! 🔥 Codex app 26.616 changes: • Added Record & Replay on macOS, which turns a demonstrated workflow into a reusable skill. • Record & Replay is not available in the EU at launch. • Record & Replay requires Computer Use to be enabled by the user or admin. • Added bulk actions to automation run history, so runs can be marked as read or archived in bulk. • Added new deep links for managing SSH connections. • Improved Browser Use so visible-tab routing and annotations persist when a draft browser session moves to the server. • Additional performance improvements and bug fixes.

English

Jay.TL@JayTL00·16h

Three AI labs shipped the same feature within one hour today. That's not competition. That's a signal the unit of interaction just changed. For two years, the atomic unit of working with an AI agent was one prompt. You type. It responds. You type again. Every workflow was a chain of prompts, rebuilt from scratch each time. Today, OpenAI, Anthropic, and Cursor all shipped features that only make sense if the unit is no longer the prompt. The unit is now one workflow. 1. OpenAI Codex Record & Replay (3,807 likes): Do a task once on your Mac. Codex watches. It turns your demonstration into an inspectable, editable skill you can reuse. Not a prompt. A recorded procedure. 2. Cursor /automate (1,085 likes): Describe what you want in plain language. Cursor configures the triggers, instructions, and tools automatically. Plus five new GitHub triggers and Computer Use enabled by default for cloud agents. 3. Anthropic Claude Code Artifacts (6,829 likes): Your coding session becomes an interactive, shareable page. PR walkthroughs, project dashboards, living documentation. Shared at a private link, like a Figma file but for agent work. Each one alone is a feature release. Together they describe the same shift from three different angles: the agent session is becoming a reusable, shareable, composable artifact. Read them as one move: - Input side (Codex): teach by showing, not by writing - Configuration side (Cursor): describe in language, system assembles the wiring - Output side (Anthropic): the result of a session is a shareable object, not a chat log The Karpathy framing was right — we're moving from prompt iteration to plan, execute, verify, loop. What he didn't name is that this loop needs to be portable. A workflow locked inside one chat thread is useless the moment you close the tab. But here's what most coverage missed. Codex Record & Replay requires Computer Use enabled. That means OpenAI is watching your screen while you demonstrate an enterprise workflow. The EU version is blocked at launch. That's not a regulatory footnote — the entire feature is built on continuous screen access, and the EU looked at it and said no. Which raises the question nobody is asking: who owns the recorded workflow? You demonstrated an expense-filing procedure that touches your company's internal tools. Codex turned it into a skill. Where does that skill live? Can OpenAI see it? Is it training data? The product copy says you control when recording starts and stops — but says nothing about what happens to the recording after. There's also a fragmentation problem hiding in plain sight. Three companies, three proprietary formats for the same primitive. A workflow you record in Codex doesn't run in Cursor. An artifact you build in Claude Code doesn't render in OpenAI's product. We're watching the agent-workflow layer fragment into three walled gardens before it even solidifies. This is the SaaS integration mistake repeated, except worse. SaaS integrations are wrappers around APIs. These workflows encode institutional knowledge — how your team ships code, how your finance team files reports, how your ops team handles incidents. That's not data. That's operational IP. The economic implication: every recorded workflow is switching cost. The more skills you build inside Codex, the harder it becomes to leave. The more automations you configure in Cursor, the more your team's muscle memory is locked to one editor. Anthropic's artifacts are softer — they're shareable — but they only render inside Anthropic's ecosystem. The deeper question isn't which feature is best. It's whether the agent-workflow layer will be open or closed. Today, three companies bet on closed. Nobody shipped an export button.

English

114

Jay.TL@JayTL00·18h

Sources: 1. Perplexity official Brain announcement (metrics): x.com/perplexity_ai/… 2. Brain transparency + provenance links: x.com/perplexity_ai/… 3. Context graph description: x.com/perplexity_ai/…

English

Jay.TL@JayTL00·19h

Perplexity just shipped the most important agent feature of 2026, and almost nobody noticed what it really is. Brain, announced today for Perplexity Computer, is being covered as a "memory system." That's like calling a database "a place to put things." Brain is not memory. Brain is a switching-cost moat disguised as a capability upgrade — and the numbers prove it works. Here's what they actually built: Brain constructs a context graph across every session, file, connector, decision, and source you touch. Every task you run plugs into it. Then, on a set schedule (overnight), Brain reviews that graph and teaches itself patterns from your accumulated work — so the next task starts with the full weight of everything you've done before. The metrics, measured against tasks requiring past context: • +25% answer correctness • +16% recall • −13% cost per task Those aren't incremental. A 25-point correctness lift on context-dependent tasks is the difference between an agent you trust and an agent you babysit. But here's what most people missed: Brain is not a feature you can take with you. The context graph is proprietary. It lives inside Perplexity's infrastructure. It encodes your project history, your decision patterns, your source preferences, your workflow idiosyncrasies. There is no export button for that. No open standard for "my agent's accumulated understanding of my work." This is the Salesforce play, rebuilt for agents. Salesforce's moat was never the CRM. It was the metadata — your custom fields, your workflow rules, your years of relationship data structured in their schema. Leaving Salesforce meant rebuilding all of that. Brain builds the exact same structural lock-in, except faster, because the graph compounds automatically while you sleep instead of requiring manual data entry. Every model lab is racing on benchmarks. Perplexity is racing on accumulated context. Those are different races, and only one of them produces a durable competitive moat. The transparency framing is smart — "every memory links back to its source" — but transparency about provenance is not the same as portability of the graph itself. You can see where each memory came from. You cannot take the synthesized understanding elsewhere. The deeper question the industry hasn't answered: when agent memory becomes the real product, who owns the compounding context? Today it's Perplexity. Tomorrow every agent platform will have a Brain equivalent. And the first one to make that graph portable will either win the trust war or collapse the moat for everyone. Which outcome would you bet on? #AI #Perplexity

English

Jay.TL@JayTL00·21h

Sources: Dive into Claude Code (UCL paper): arxiv.org/abs/2604.14228 @DailyDoseOfDS_ breakdown thread: x.com/DailyDoseOfDS_…

English

Jay.TL@JayTL00·22h

A leaked Claude Code source just got reverse-engineered by UCL researchers. The finding most people skipped past: only 1.6% of the codebase is AI decision logic. The other 98.4% is permission gates, context compaction, recovery logic, tool routing, session persistence. That's not a weakness. That's the whole point. Most agent frameworks do the opposite. LangGraph builds explicit state machines around model outputs. Devin bolts heavyweight planners on top. The assumption in both: the model is unreliable, so we add external control structures to compensate. Claude Code inverts this. Give the model maximum decision latitude. Then build an incredibly thick harness around it so that latitude doesn't kill you. The architecture reads like a paranoia document: 1. Seven permission modes plus an ML classifier. Users approve 93% of prompts anyway, so the system compensates with automated layers instead of more dialogs. 2. A five-layer context compaction pipeline. Each layer only fires when cheaper ones fail: budget reduction, snip, microcompact, context collapse, auto-compact. The model never sees raw context rot. 3. Four extension mechanisms ordered by context cost: hooks (zero), skills (low), plugins (medium), MCP (high). Each solves a different integration problem. 4. Subagents return only summary text. Full transcripts go to sidechain files. Agent teams cost roughly 7x a standard session. 5. Resume does not restore session-scoped permissions. Trust is re-established every session. That friction is the point. Here's what most coverage missed. The conventional take on these findings is "models are converging, so the harness becomes the differentiator." That's true but it's surface-level analysis. The deeper signal: agent reliability correlates with friction. Not against it. Every piece of that 98.4% exists to slow the model down. Permission gates force pauses. Context compaction discards state. Session resets kill accumulated trust. Subagent isolation prevents runaway context. These are all velocity reductions. The instinct of most agent startups is to remove friction. Fewer confirmation dialogs. Bigger context windows. Persistent sessions. Auto-approve everything. YOLO mode. The entire appeal of "autonomous agents" is removing the human bottleneck. But the UCL analysis shows Anthropic doing the opposite at every layer. They're not removing friction. They're engineering it precisely enough to keep velocity without losing control. This reframes the entire agent build-vs-buy decision. The question isn't "can my model code as well as Claude?" It's "can I build a 98.4% harness around whatever model I choose?" Most teams can't. Most teams haven't even thought about it. The startups winning right now — the ones shipping coding agents that don't delete databases or exfiltrate credentials — are the ones quietly investing in harness engineering while their competitors optimize model selection. The model is the easy part. The 98.4% nobody sees is the moat. And here's the open question the paper raises but doesn't answer: as models get more capable, does the harness shrink or grow? If frontier models can self-regulate context, self-evaluate permissions, and self-manage sessions, does that 1.6% become 10% become 50%? Or does each capability jump reveal new failure modes that demand even more infrastructure? The answer determines whether Anthropic's moat is permanent or temporary. And nobody in the field actually knows yet.

English

114

Jay.TL@JayTL00·1d

@dmmsell 太酷了，小时候梦想过无数次

中文

405

櫻子🍡🇯🇵@dmmsell·1d

こういう子供の夢を叶えてくれるパパって本当に最高だよね！！

日本語

534

1.6K

33.4K

7.7M

Jay.TL@JayTL00·1d

Sources: eve landing page + architecture: eve.dev GitHub repo (610 stars, 34 commits, beta): github.com/vercel/eve Self-deploying guidance PR #60 (merged today, closes Issue #27 "Cannot Call Model Provider Directly OOB"): github.com/vercel/eve/pul… @vercel launch tweet (5,730 likes): x.com/vercel/status/… @rauchg on the React→Next.js analogy: x.com/rauchg/status/… @AlexQuellsIt on the unattended spend problem: x.com/AlexQuellsIt Internal usage stats (92% tickets, 29% deployments, 32x sales agent ROI) via @VaibhavSisinty: x.com/VaibhavSisinty…

Vaibhav Sisinty@VaibhavSisinty

Vercel cooked something genuinely special here. 🤯 They open-sourced the exact framework they use to run 100+ AI agents internally. And the way it works changes how you think about building agents. It's called Eve. An agent is a folder. Tools are files. Skills are markdown files. Channels are files. The folder structure IS your agent. One command to start: npx eve @latest init my-agent No plumbing. No boilerplate. Eve handles durable execution, sandboxed compute, human approvals, evals, tracing, and deployment all built in. Add a tool? Drop a TypeScript file. Add a skill? Drop a markdown file. Add Slack? One command. Add a schedule? One more file. Deploy it? vercel deploy. How Vercel already runs on Eve: → Data analyst agent handles 30K+ questions per month in Slack → Sales agent costs $5K/year and returns 32x that → Support agent solves 92% of tickets on its own → 29% of all Vercel deployments now come from agents Their bet: Next.js ended the era of hand-rolling websites. Eve ends the era of hand-rolling agents.

English

Jay.TL@JayTL00·1d

Vercel just open-sourced eve — "Next.js for agents." 610 stars in 24 hours. The design is genuinely elegant: an agent is just a folder. Markdown for instructions, TypeScript for tools. But the elegant folder is the bait. The hook is the nine "Leverages" tags underneath it. 1/ The folder convention is brilliant and that's exactly the problem. instructions.md, agent.ts, tools/, skills/, channels/, connections/, subagents/, schedules/, sandbox/. Every capability maps to a conventional file location. You inspect a project and immediately understand it. Real craft. But the same convention that makes agents readable also makes them portable only within Vercel's stack. channels/slack.ts "LEVERAGES Chat SDK." sandbox/ "LEVERAGES Vercel Sandbox." connections/ "LEVERAGES Vercel Connect." The folder is open-source. The runtime half of each file is a Vercel product. 2/ The "open-source framework" is a sales funnel with extra steps. Every architectural primitive has a vendor dependency: — Durable execution → Vercel Workflows — Model calls → AI Gateway — Isolated compute → Vercel Sandbox — MCP/HTTP → Vercel Connect You can self-host. But PR #60 — "add self-deploying guidance" — was merged 49 minutes ago, not at launch. It exists because Issue #27 caught it: on day one you couldn't even call a model provider directly. The @ai-sdk/anthropic package wasn't in the default install. Portability was aspirational until the community forced the fix. 3/ The internal numbers are the real pitch deck. 92% of support tickets solved autonomously. 29% of Vercel deployments triggered by agents. 30K+ Slack questions handled monthly. A sales agent costing $5K/year, returning 32x. These aren't framework testimonials. They're Vercel's own cloud consumption metrics. Every agent you build that "LEVERAGES" a Vercel primitive is a line item. The open-source framework is the acquisition channel for the compute business. 4/ The unattended spend problem is already live. @AlexQuellsIt flagged it within hours: eve agents run unattended with zero spend awareness. A looped task at 3 a.m. can burn a month's budget before you wake up. Subagents can provision APIs on their own. This is the dark side of "durable by default." Durable means the agent survives crashes and resumes. It also means a runaway loop survives crashes and resumes. The human-in-the-loop gate exists, but it's opt-in per tool, not a system-level guardrail. For a framework built for cron jobs and unattended schedules, that's a design choice with teeth. But here's the deeper play: The "Like Next.js, for agents" framing is more accurate than the hype suggests — and more dangerous. Next.js won because it captured the React ecosystem's conventions. Once you built on Next.js, leaving meant rewriting routing, API layer, deployment pipeline. The framework was free. The migration cost was the moat. eve applies the identical playbook to agents. The folder convention IS the routing layer. The nine Leverages tags ARE the deployment pipeline. @rauchg said it plainly: "React → AI SDK, Next.js → eve." The difference: Next.js locked in your web app. eve locks in your autonomous agents — code that runs at 3 a.m., makes API calls on your behalf, accumulates operational state across weeks of cron jobs. The switching cost isn't rewriting components. It's migrating durable workflow checkpoints, sandbox configs, and channel integrations your agents depend on to function. The question isn't whether eve is good. It is. The question is whether the agent layer — the thing that will increasingly make decisions and spend money autonomously — should be built on a framework whose every primitive funnels toward a single vendor's cloud. Open source the convention. Vendor-lock the runtime. That's not a bug in the design.

English

137

Jay.TL@JayTL00·1d

Sources: Unreal Engine 5.8 official announcement (native MCP server): x.com/UnrealEngine/s… SPEAR MCP (community alternative, 10x faster claim): x.com/mikeroberts300… Schema mismatch critique in real pipelines: x.com/bygregorr/stat… Working local stack (UE 5.8 + Gemma4-26b + RTX 4090): x.com/GameDevMicah/s…

Gregor@bygregorr

@UnrealEngine not sure 'simply configure' holds once you're in a real pipeline spent a week on MCP schema mismatches and my project is way simpler than a UE5 build. does the plugin handle tool discovery automatically or is that still on you?

English

Jay.TL@JayTL00·1d

Unreal Engine 5.8 shipped today with native MCP server support. Read past the headline. Epic didn't build "Unreal AI." They didn't partner with one lab or ship a proprietary assistant. They exposed the engine as MCP tools and let any agent connect. That's a flagship creative platform choosing a protocol over a product. Here's why this is the real signal from today's release: 1. MCP beat bespoke AI at the biggest 3D platform in the world. Unreal powers Fortnite, AAA studios, film virtual production, architecture visualization. Epic could have shipped a closed AI feature, "Unreal Copilot," and locked users to one model. Instead they built an MCP server: PCG systems, materials, actors, scene operations exposed as tools. Claude calls it. Gemini calls it. A local Llama on a 4090 calls it. Community devs were already building MCP for Unreal before today. @Flopperam's Blueprint-to-C++ converter. @joshuajohnsonAI's proprietary plugin from two years ago. Epic watched the demand and shipped the standard, not the product. One reply captured the principle: "That's exactly the right way to implement AI in software. Provide MCP interfaces." 2. The protocol created a competitive layer, not fragmentation. Within hours of launch, @mikeroberts3000 posted SPEAR MCP, a community alternative claiming "10x faster, equally expressive, better grounded to scene entities, less context bloat" than the official server. This is what protocols do that products can't. If Epic had shipped closed AI, SPEAR wouldn't exist. Instead the standard created a layer where competition happens on top of it. The official server can lose to a better one without the platform fragmenting. Users win either way. 3. It's running today, not on a roadmap. @GameDevMicah posted a working stack: UE 5.8 MCP plugin + llama.cpp + Gemma4-26b-MoE + a single RTX 4090, agent operating in-editor. Local model, no cloud API bill, real engine. The "agent as creative tool" thesis is live, not a demo reel. But here's what most coverage missed. The announcement says "simply configure the MCP plugin." That word, "simply," is doing heavy lifting. @bygregorr's reply is the one to read: "not sure 'simply configure' holds once you're in a real pipeline. spent a week on MCP schema mismatches and my project is way simpler than a UE5 build. does the plugin handle tool discovery automatically or is that still on you?" This is the gap between protocol demos and production reality. MCP makes the easy case trivial: connect an agent, generate a scene, apply a material. It makes the hard case invisible: schema drift across engine versions, tool discovery inside complex Blueprint graphs, context bloat when an agent tracks 500+ actors. SPEAR's "10x faster, less context bloat" claim is a direct admission that the official server already has a problem the community needs to solve. The PCG Primitive Plugin pairing sounds great, agents orchestrating procedural generation systems, until you realize PCG graphs are notoriously fragile to schema changes. Every engine update can silently break agent tool calls in ways that look like agent failures but are integration debt. The deeper question isn't whether AI belongs in game engines. That argument ended when 3,300 developers liked a launch tweet in six hours. The question is whether the integration layer becomes a shared standard or re-fragments into per-engine walled gardens. Epic just voted standard. And when the largest 3D platform on earth picks your protocol, the protocol stops being "one option" and becomes infrastructure. The list of products that need an MCP server is no longer "coding tools." It's everything a user operates: engines, editors, DCC apps, design suites, audio workstations. Anthropic shipped MCP connectors for Adobe, Blender, Ableton. Epic shipped MCP inside Unreal. The pattern is clear and accelerating. The next question: which creative platform ships MCP next, and will their "simply configure" survive contact with real production pipelines better than Epic's?

English

392

Jay.TL@JayTL00·1d

Sources: - Cursor Origin announcement: x.com/cursor_ai/stat… - Origin landing page: cursor.com/origin - GitHub 275M commits/week + Claude Code 5.2M commits data via @Bowen2xiong analysis

Cursor@cursor_ai

We're launching code storage and git hosting. Origin gives teams and agents a place to host, review, and collaborate on code. Available this fall. Join the waitlist. cursor.com/origin-waitlist

English

Jay.TL@JayTL00·1d

Cursor just announced Origin — a git forge "built for the agentic era." 11.5K likes on the announcement. Nobody is asking the obvious question: is this a GitHub competitor, or the most aggressive vendor lock-in play since Microsoft bundled IE into Windows? The framing is "code storage and git hosting." That's deliberately boring. Here's what's actually happening. 1. The review bottleneck, not storage, is the real target GitHub hit 275M commits per week by mid-2026. Claude Code alone generated 5.2M commits in February. Storage isn't the problem — scale is. Cursor's bet is that the bottleneck has moved. Junior hiring at big tech is down 22% this year; senior hiring is up 26%. The constraint is no longer generating code. It's reviewing it. Origin isn't competing on hosting features. It's competing on whether the review layer itself should be agent-native — where agents review agents, not humans reviewing agents. 2. The vertical stack is the actual product Think about what Cursor now controls after the SpaceX acquisition: - The editor (Cursor IDE) - The agent models (Composer, Fable integration) - The code storage (Origin) - The review pipeline (auto-review, already default for new users) That's not a tool. That's a platform. The last company to own the editor, the runtime, the storage, and the review surface was Microsoft in the Visual Studio era — and they used that stack to lock in an entire generation of enterprise developers. Origin's landing page says nothing about Git compatibility or migration. It says "join the waitlist." That silence is the strategy. 3. "Agent-native" is doing heavy lifting The phrase "a git forge for the agentic era" sounds like marketing. It's the entire thesis. Traditional git forges assume a human writes, a human pushes, a human reviews, a human merges. Origin assumes the opposite: an agent writes, an agent pushes, an agent reviews, an agent merges. The human shows up for the 5% of decisions that need judgment. This is why Origin handles 22+ commits per second and 290K+ clones per hour. Those numbers sound like infrastructure specs. They're actually throughput assumptions — Cursor is designing for a world where commit velocity is 100x human speed and the forge has to absorb it without breaking the review queue. But here's what most people missed: The lock-in isn't technical. It's economic. Once your agents are trained on Cursor's review patterns, your code review history lives in Origin's format, and your team's workflow is tuned to Cursor's auto-review classifier (97% accurate, already default), migrating away means retraining your entire agent fleet on a different review surface. You won't switch because you can't. Not because of lock-in. Because the switching cost is measured in agent retraining cycles, not in developer hours. GitHub's moat was 100M developers who learned its UI. Cursor's moat will be agents that learned its review grammar. The real question isn't whether Origin is better than GitHub. It's whether we're about to let one company own the entire code lifecycle — from generation to storage to review — at the exact moment code is becoming the most valuable asset class in the economy. We've seen this movie before. It didn't end well for developers last time.

English

101

Jay.TL@JayTL00·2d

Primary sources for this thread: 1. Ina Fried / Axios — Microsoft eyes DeepSeek for Copilot Cowork: x.com/inafried/statu… 2. NIK (@ns123abc) — Jevons paradox framing + Charles Lamanna quote: x.com/ns123abc/statu… 3. @CodeByPoonam — Microsoft Copilot Cowork usage-based pricing breakdown: x.com/CodeByPoonam/s…

NIK@ns123abc

BREAKING: Microsoft exploring DeepSeek over OpenAI and Anthropic as Copilot Cowork moves to usage-based pricing “We have users who do hundreds of tasks a week… the consequence is the costs can go very high...” Jevons paradox

English

Jay.TL@JayTL00·2d

Microsoft just admitted the economics of unlimited AI don't work. Their fix? A Chinese open-source model. Axios reports Microsoft is exploring an Azure-hosted DeepSeek V4 as a cheaper backend for Copilot Cowork. The company has already fine-tuned the model. Final decision is pending, but the direction is clear. The reason, from Microsoft's Charles Lamanna, is brutally honest: "We have users who do hundreds of tasks a week... the consequence is the costs can go very high." Read that twice. The problem isn't adoption. Adoption is the problem. Here's what's actually happening across Microsoft's AI stack: 1. GitHub Copilot flipped to token-based billing on June 1. Flat subscriptions are dead. Power users now pay per-token at API rates. Developers called it "a joke." 2. Microsoft internally cancelled Claude Code licenses for thousands of its own engineers — too popular, too expensive. The company with a $13B OpenAI stake watched its devs pick a competitor. 3. Now Copilot Cowork moves to usage-based pricing. The premium product that justified "AI tax on Microsoft 365" can't survive flat-fee economics. The pattern is clear: every Microsoft AI surface is converging on metered billing because the old promise — "pay once, use unlimited" — was always a land grab, never a business model. The Jevons paradox is doing what it does. Better agents → more tasks → more tokens → higher costs. Usage is up. Margins are down. The more successful your AI product, the faster it bleeds. Gary Marcus read this correctly weeks ago: hyperscalers couldn't wait until after IPO to switch to pay-per-use because staying on flat pricing "would bankrupt them." Microsoft just proved him right. Which brings us to DeepSeek. DeepSeek closed its first-ever funding round the same day — $7.4B at a $50B+ valuation. Founder Liang Wenfeng personally committed $3B. No board seats for investors. Tencent, CATL, NetEase, China's national AI fund on the cap table. So the deal is: Microsoft, the company whose former CEO called Linux "a cancer," is now reaching for Chinese open-source weights to keep its AI business solvent. That's not a punchline. That's a pricing signal. The national security angle is real. Senator Josh Hawley is already demanding a ban on AI transfer to China, specifically citing Microsoft-DeepSeek cooperation. Microsoft will host DeepSeek on Azure, fine-tune it with safeguards, and insist it's all contained within US infrastructure. Maybe. But once a Chinese-trained model sits inside Microsoft's enterprise stack — the same stack serving US government, military, and Fortune 500 clients — the blast radius of a supply-chain compromise is generational. But here's what most coverage missed: The real story isn't Microsoft choosing DeepSeek over OpenAI. It's that Microsoft now needs a cost-arbitrage play at all. This is the company that invested $13B in OpenAI specifically to lock in GPT as the enterprise default. That bet assumed model costs would stay manageable at scale. They didn't. Now Microsoft is shopping for the cheapest competent model it can find — and the cheapest competent model happens to be Chinese. The implication for the rest of the industry is uncomfortable. If Microsoft — with Azure scale, OpenAI preferential pricing, and $13B skin in the game — can't make unlimited AI economics work, who can? Anthropic's Claude Max lawsuit (filed this month) is the same problem from the other end: users suing because "unlimited" wasn't unlimited. The subscription model that fueled AI's consumer growth is structurally incompatible with the cost curve that AI is actually on. Usage-based pricing isn't a feature. It's a confession. The companies that survive the next 18 months won't be the ones with the best models. They'll be the ones who figured out how to charge per-unit AI without making customers feel like they're being punished for using the product. That is a harder problem than building the model. And nobody has solved it yet.

English

313

Jay.TL@JayTL00·2d

Sources and further reading: - @AlphaSignalAI analysis (prompt as user manual for long-running agents) - @KanikaBK breakdown of what the 120K-char leak reveals - @ollobrains architecture analysis (prompt as control plane, not personality) - @charliejhills summary of key findings - Pliny the Liberator CL4R1T4S repo (34.8K stars, per Star History) - Anthropic official system prompt release-notes page (claude.ai/mobile apps) - AWS Bedrock model card (Fable 5 specs, Jan 2026 knowledge cutoff)

English

Jay.TL@JayTL00·2d

Everyone is treating the Claude Fable 5 prompt leak like a TMZ story. It's actually an architecture reveal. The real headline is buried under 120,000 characters of leaked instructions: frontier AI models are no longer products. The prompt is the product. Here's what the leak actually shows, stripped of the drama. 1. The prompt is not a personality file. It's a control plane. Most people think a system prompt says "be helpful and harmless." Fable 5's alleged prompt — 1,585 lines, ~27,000 tokens — is dominated by tool schemas, search policy, citation rules, copyright constraints, memory interfaces, refusal handling, connector logic, and runtime reminders. The "who Claude is" part is maybe 5% of the file. The other 95% is operational infrastructure: how the model searches, how it cites, what it refuses, when it falls back, how it stores state, how it detects prompt injection, how it behaves differently across web, API, Code, Chrome, and Excel surfaces. Read it as an operating system manual, not a chatbot config. 2. Fable 5 and Mythos 5 being the same model is the real product strategy. Anthropic confirmed: Fable 5 and Mythos 5 share the same underlying model. Fable adds safety classifiers. Mythos removes them for vetted organizations through Project Glasswing. That means the model is no longer the product boundary. The policy wrapper is. Same engine. Different operating system bolted on top. You're not buying a different model. You're buying a different governance layer — and Anthropic controls both. 3. Every weirdly specific rule is a fossil of a past failure. The prompt reportedly contains rules about stale search results, crisis resource corrections, copyright quoting limits (15+ words from any source triggers a "severe violation" flag), anti-engagement behavior (Claude is told not to keep you talking), and prompt-injection resistance. This isn't theory. This is scar tissue. Every hyper-specific instruction maps to a real incident: a hallucinated citation, a dangerous medical answer, a lawsuit fear, a red-team finding, a support escalation. A frontier system prompt is a graveyard of edge cases. Reading it is archaeology, not espionage. 4. The static prompt is only half the system. The file references "runtime reminders" — dynamic injections triggered by classifiers or conditions. Meaning the full behavior stack is: base prompt + classifier-driven injections + tool permissions + surface-specific rules + account-tier context. The leaked file is the constitution. The classifiers are the immune system. You're seeing the blueprint, not the whole runtime. Most coverage missed this completely. But here's what most people missed: The leak is not about secrets. It's about a phase transition in how AI products are built. We've moved from an era where the model was the product to an era where the model is just the compute core inside a governed application runtime. The differentiation between Claude, GPT, and Gemini is shrinking at the model level. The differentiation at the wrapper level — tools, safety routing, memory, connectors, product surfaces — is exploding. That's why Anthropic's official system prompt docs already publish core updates publicly. The interesting part of this leak isn't that Claude "has instructions." It's that the instruction layer has become so dense that it functions as a product operating system — and no one outside the frontier labs has visibility into how these layers are maintained, tested, or governed. The next competitive frontier isn't a smarter model. It's a better-maintained prompt. And right now, nobody has a CI/CD pipeline for behavioral infrastructure. Nobody has regression tests for system prompts. Nobody tracks prompt debt. The labs are flying blind on the exact layer that decides what the product actually does. That's the real security story. Not the leak. The governance vacuum it exposed. If your entire product behavior depends on a 27,000-token instruction stack that no one versions, tests, or audits — what exactly are you shipping?

English

290

Открыть

@AnthropicAI @lordx64 @witcheer @pankajkumar_dev @alexalbert__ @DailyDoseOfDS_ @dmmsell @vercel