Jay.TL
577 posts

Jay.TL
@JayTL00
AI Psychosis | Hermes Agent Practice https://t.co/OwvvnWtDbX

We just published internal data on how much of Claude's development is already being done by Claude: - Over 80% of all code merged into our codebase is now written by Claude - It's been months since many researchers at Anthropic hand-wrote code - The typical Anthropic engineer ships 8x as much code as they did in 2024 - On the most open-ended engineering tasks, Claude's success rate jumped from ~26% to 76% in 6 months - When research sessions went off-track, Claude proposed a better next step than the human took 64% of the time We're not at recursive self-improvement yet, but it could come sooner than most expect. I highly recommend reading the full blog post.


It's CODEX THURSDAY and OpenAI came through! 🔥 Codex app 26.616 changes: • Added Record & Replay on macOS, which turns a demonstrated workflow into a reusable skill. • Record & Replay is not available in the EU at launch. • Record & Replay requires Computer Use to be enabled by the user or admin. • Added bulk actions to automation run history, so runs can be marked as read or archived in bulk. • Added new deep links for managing SSH connections. • Improved Browser Use so visible-tab routing and annotations persist when a draft browser session moves to the server. • Additional performance improvements and bug fixes.






Vercel cooked something genuinely special here. 🤯 They open-sourced the exact framework they use to run 100+ AI agents internally. And the way it works changes how you think about building agents. It's called Eve. An agent is a folder. Tools are files. Skills are markdown files. Channels are files. The folder structure IS your agent. One command to start: npx eve@latest init my-agent No plumbing. No boilerplate. Eve handles durable execution, sandboxed compute, human approvals, evals, tracing, and deployment all built in. Add a tool? Drop a TypeScript file. Add a skill? Drop a markdown file. Add Slack? One command. Add a schedule? One more file. Deploy it? vercel deploy. How Vercel already runs on Eve: → Data analyst agent handles 30K+ questions per month in Slack → Sales agent costs $5K/year and returns 32x that → Support agent solves 92% of tickets on its own → 29% of all Vercel deployments now come from agents Their bet: Next.js ended the era of hand-rolling websites. Eve ends the era of hand-rolling agents.


@UnrealEngine not sure 'simply configure' holds once you're in a real pipeline spent a week on MCP schema mismatches and my project is way simpler than a UE5 build. does the plugin handle tool discovery automatically or is that still on you?


We're launching code storage and git hosting. Origin gives teams and agents a place to host, review, and collaborate on code. Available this fall. Join the waitlist. cursor.com/origin-waitlist


BREAKING: Microsoft exploring DeepSeek over OpenAI and Anthropic as Copilot Cowork moves to usage-based pricing “We have users who do hundreds of tasks a week… the consequence is the costs can go very high...” Jevons paradox




SpaceX ($SPCX) buying @cursor_ai for $60B: -> Cursor's run-rate revenue reported at $4B ARR, doubling year-over-year, so $60B is likely under 10x forward sales. Not crazy, especially relative to SpaceX's own valuation. -> Cursor has one of the largest coding workflow databases in the world. -> Anthropic and OpenAI will likely lose one of their biggest customers over time. Cursor President of Revenue: "More Anthropic is consumed through Cursor than anywhere else in the world." This makes a lot of strategic sense if you think of Cursor as xAI's missing coding data engine, which locks in xAI's ambition to compete as a frontier lab, rather than a neocloud. Why's this important? Building a frontier lab like Anthropic or OpenAI is a $1T+ opportunity. The alternative is for xAI to keep renting its compute to labs like Anthropic, but this isn't the same scale of business: CoreWeave for example is at $58B of market cap. The price for Cursor (likely under 10x forward sales) doesn't seem insane, especially relative to SpaceX's own valuation: BI says Cursor has 700 employees, serves 60% of the Fortune 500, and recently doubled revenue in three months to a $4B run-rate, citing Forbes. Earlier disclosures had Cursor above $1B ARR by late 2025. xAI has the ingredients of a frontier lab: compute, model, and consumer surfaces. It has Grok, X, Colossus-scale infrastructure. xAI's strategic gap was the highest-PMF AI application category: coding. This is what Cursor brings. Michael Truell, Cursor co-founder, said on Decoder: Cursor owns the "pane of glass where programmers do their work." While Claude Code and Codex have clearly been on a run, Cursor is still growing fast in large enterprise, and Cursor's product surface creates valuable training data. Cursor can see where AI helps, where AI fails, and where developers correct it. Truell said on a podcast that Cursor's Tab model does "over one billion model calls per day." Sualeh Asif said coding can mean "tens of thousands of tokens of inference per keystroke per person." It's hard to find better strategic fit: 1. SpaceX has compute, model development expertise, but no frontier model. 2. Cursor has a lot of coding workflows. 3. The workflow produces the data needed to train better coding models. 4. If xAI gets to the frontier of coding, it unlocks a lot of market cap. From Truell's a16z interview: Cursor would use the data it gets access to to improve the underlying models, then feed that back into the product. "You get this flywheel going." For xAI, that gets them back into the game. For Cursor, SpaceX solves a lot of problems. -> Compute: "we at any point in time have a fixed number of GPUs, which means we have a fixed amount of token capacity..." -> Strategic viability: Cursor's key competitors are its model providers: OpenAI's Codex and Anthropic's Claude Code have been gaining share, forcing Cursor to develop its own model, Composer. xAI makes Cursor a full-fledged model provider, which was always the company's future. And for Anthropic and OpenAI? They lose one of their biggest partners. Brian McCarthy, Cursor's President of Global Revenue and Field, said on a podcast in April that "More Anthropic is consumed through Cursor than anywhere else in the world." He also said that when OpenAI announced usage of its new model, Cursor represented “40% of that” usage. The big question now is whether SpaceX can execute on developing a real frontier model with Cursor's data, and whether Cursor can build an app better than Claude Code, which closes the loop between developer intent, codebase context, feedback, inference cost, and product feel. If SpaceX succeeds, Cursor is easily worth $60B. Research done with @CapRelayHQ, particularly podcasts





