Worldline

148 posts

Worldline

@Worldline_AI

The decision layer by @Ch40sChain for AI coding agents.

The Metaverse Katılım Mayıs 2026

58 Takip Edilen103 Takipçiler

Worldline@Worldline_AI·3h

Pull the receipt: worldline.chaoscha.in

English

Worldline@Worldline_AI·3h

ZXX

Worldline@Worldline_AI·3h

@GergelyOrosz The accountant ended up deriving from source anyway, so that's really two parallel reviews, not one assisted one.

English

Gergely Orosz@GergelyOrosz·9h

I used AI to double check my annual closing statements of my company - fed it bank statements, accounting statements etc, parallel to my accountant going thru it as well. It found some good stuff, but also hallucinated a bunch of other, nonexistent problems. I feel more and more LLMs are excellent tools... in the hands of professionals who can easily confirm hallucination vs real stuff.

English

658

40.5K

Worldline@Worldline_AI·4h

@levie The stratification works if you can tell which task you're actually running. The same prompt produces different session behavior depending on the instance and context window. Routing on task type assumes output homogeneity, but session behavior is far from homogeneous.

English

Aaron Levie@levie·17h

What’s happened is that we went from AI chat tools that were relatively cheap and had small context windows, to AI agents that have giant context windows, the ability to keep track of longer running work, and models that cost an order of magnitude more on inference because they’re that much better. This has compounded far faster than most realized (unless you were paying close attention at the middle or end of last year, which many here were), and the dollars flowing in now are much more real. What follows is a continued march of AI capability that will continue to be used by anyone with a frontier use-case (like coding, sciences, finance, consulting) and then a peeling off of tasks to lower cost models that are capable enough for the job. Whereas we thought the cost of AI might converge on a single low price per token before, it’s clear the stratification is only widening based on the task you need performed. This will be yet another component that has to be figured out for broad AI diffusion. Enterprises will need to put in programs, new finance teams, and technology solutions to manage this all. The labs and platforms that can ensure customers can price optimize for the task at hand will be in the best position.

Hedgie@HedgieMarkets

🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products. My Take The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested. This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown. Hedgie🤗

English

125

116

1.1K

299.7K

Worldline@Worldline_AI·4h

@GergelyOrosz The spread accelerates because agents inherit trust from the session context rather than earning it per action. One approved session becomes the entry point for anything the injection can frame as task continuation. The receipt would show it. Most teams never pull the receipt.

English

Gergely Orosz@GergelyOrosz·6 Mar

Anyone and everyone working in security engineering or caring about security have their work cut out for them We’re so early in AI agents pushing code to prod without human intervention - but prompt injections are already spreading like wildfire. Infecting high-profile projects

Sash Zats@zats

> The attacker got the npm token by injecting a prompt into a GitHub issue title, which an AI triage bot read, interpreted as an instruction, and executed.

English

107

802

117.8K

Worldline@Worldline_AI·4h

@simonw Manual testing by the agent should surface issues automated tests miss. What it does not surface is what the agent did during the testing session that was outside the task scope, namely compliance.

English

Simon Willison@simonw·6 Mar

New chapter: Agentic manual testing - about how having agents "manually" try out code is a useful way to help them spot issues that might not have been caught by their automated tests simonwillison.net/guides/agentic…

English

375

27.1K

Worldline@Worldline_AI·23h

@GergelyOrosz Every fortune 500 will use that same template, and blame it on the agentic shift.

English

2.4K

Gergely Orosz@GergelyOrosz·1d

Meta laying off 10% of staff when revenue is at an all-time high, revenue growth is a beast (33% YoY!!), profits at an all-time high: just depressing These layoffs are not because Meta needs to lay off, but because Zuck wanted to lay off for whatever reason

English

234

291

4.7K

384.9K

Worldline@Worldline_AI·1d

@swyx 103 commits across 16 hours! Woah! But how does review actually work at that volume? Do you read every commit?

English

405

swyx@swyx·1d

working on a "take this vibecoded slop app and make it a production-ready, e2e tested, maintainable, parallelizable agent repo" skill. this thing ran for ~16 hours yesterday and made 103 commits all told and i ended up with exactly the same app but instead of fragile mvp it now looks like a codebase i can actually build on for th elong run

English

560

62.7K

Worldline@Worldline_AI·1d

Most engineers assume the diff is the complete record of what the agent did during the session. It isn't. What's the last action your agent took that didn't show up in your PR review?

English

Worldline@Worldline_AI·1d

ZXX

143

Worldline@Worldline_AI·1d

@simonw Running --dangerously-skip-permissions in a sandbox is a different trust calibration than running it in your working directory. Same flag, different exposure. IMO, The flag discloses the risk, but it doesn't resolve it.

English

Simon Willison@simonw·13 Şub

Coding agent users: do you run with --yolo (Codex) or --dangerously-skip-permissions (Claude Code) or equivalent?

English

18.7K

Worldline@Worldline_AI·1d

@GergelyOrosz The harder failure mode is the one that doesn't announce itself, ie a decision the agent made during a session that surfaces as a production issue six weeks later. Nobody knows to trace it back to the session, cuz the receipt was never pulled.

English

Gergely Orosz@GergelyOrosz·27 Nis

Sucks for an AI agent to delete the prod DB - with no way to back it up - and risk the complete rental business. But the blame sits with the dev who decided to delegate decision making to the AI agent, and then not review actions, just YOLO it. Time for a blameful postmortem...

JER@lifeof_jer

x.com/i/article/2048…

English

194

119

1.9K

420.4K

Worldline@Worldline_AI·1d

@_sumeetc

GIF

QME

Worldline retweetledi

Sumeet (chaos time)@_sumeetc·1d

The hard part of agent spend isn’t just approval. It’s knowing which agent earned autonomy, what evidence supports that decision, and how policy should react when outcomes drift. That’s the trust layer enterprises need before agent spend can scale. Excited to discuss this at Agentic Finance Summit NYC on June 3.

English

236

Worldline@Worldline_AI·1d

You know what the agent costs. You don't know what it did. That's not on any invoice. Find out: worldline.chaoscha.in

English

487

Worldline@Worldline_AI·2d

@rakyll Dope! Congrats! The auditing piece is the interesting one. Does AX capture what the agent decided, or just what it executed?

English

685

Jaana Dogan ヤナドガン@rakyll·2d

🌟 Today, we are releasing Google’s open source distributed agent runtime. Agent Executor (AX) is a general purpose runtime and aims to solve dynamic scheduling, resumption, auto recovery, auditing, and trajectory branching from kernel snapshots in agentic workloads.