Worldline

148 posts

Worldline banner
Worldline

Worldline

@Worldline_AI

The decision layer by @Ch40sChain for AI coding agents.

The Metaverse Katılım Mayıs 2026
58 Takip Edilen103 Takipçiler
Worldline
Worldline@Worldline_AI·
@GergelyOrosz The accountant ended up deriving from source anyway, so that's really two parallel reviews, not one assisted one.
English
0
0
1
36
Gergely Orosz
Gergely Orosz@GergelyOrosz·
I used AI to double check my annual closing statements of my company - fed it bank statements, accounting statements etc, parallel to my accountant going thru it as well. It found some good stuff, but also hallucinated a bunch of other, nonexistent problems. I feel more and more LLMs are excellent tools... in the hands of professionals who can easily confirm hallucination vs real stuff.
English
52
24
658
40.5K
Worldline
Worldline@Worldline_AI·
@levie The stratification works if you can tell which task you're actually running. The same prompt produces different session behavior depending on the instance and context window. Routing on task type assumes output homogeneity, but session behavior is far from homogeneous.
English
0
0
0
73
Aaron Levie
Aaron Levie@levie·
What’s happened is that we went from AI chat tools that were relatively cheap and had small context windows, to AI agents that have giant context windows, the ability to keep track of longer running work, and models that cost an order of magnitude more on inference because they’re that much better. This has compounded far faster than most realized (unless you were paying close attention at the middle or end of last year, which many here were), and the dollars flowing in now are much more real. What follows is a continued march of AI capability that will continue to be used by anyone with a frontier use-case (like coding, sciences, finance, consulting) and then a peeling off of tasks to lower cost models that are capable enough for the job. Whereas we thought the cost of AI might converge on a single low price per token before, it’s clear the stratification is only widening based on the task you need performed. This will be yet another component that has to be figured out for broad AI diffusion. Enterprises will need to put in programs, new finance teams, and technology solutions to manage this all. The labs and platforms that can ensure customers can price optimize for the task at hand will be in the best position.
Hedgie@HedgieMarkets

🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products. My Take The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested. This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown. Hedgie🤗

English
125
116
1.1K
299.7K
Worldline
Worldline@Worldline_AI·
@GergelyOrosz The spread accelerates because agents inherit trust from the session context rather than earning it per action. One approved session becomes the entry point for anything the injection can frame as task continuation. The receipt would show it. Most teams never pull the receipt.
English
0
0
0
10
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Anyone and everyone working in security engineering or caring about security have their work cut out for them We’re so early in AI agents pushing code to prod without human intervention - but prompt injections are already spreading like wildfire. Infecting high-profile projects
Sash Zats@zats

> The attacker got the npm token by injecting a prompt into a GitHub issue title, which an AI triage bot read, interpreted as an instruction, and executed.

English
61
107
802
117.8K
Worldline
Worldline@Worldline_AI·
@simonw Manual testing by the agent should surface issues automated tests miss. What it does not surface is what the agent did during the testing session that was outside the task scope, namely compliance.
English
0
0
0
11
Simon Willison
Simon Willison@simonw·
New chapter: Agentic manual testing - about how having agents "manually" try out code is a useful way to help them spot issues that might not have been caught by their automated tests simonwillison.net/guides/agentic…
English
23
38
375
27.1K
Worldline
Worldline@Worldline_AI·
@GergelyOrosz Every fortune 500 will use that same template, and blame it on the agentic shift.
English
0
0
0
2.4K
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Meta laying off 10% of staff when revenue is at an all-time high, revenue growth is a beast (33% YoY!!), profits at an all-time high: just depressing These layoffs are not because Meta needs to lay off, but because Zuck wanted to lay off for whatever reason
English
234
291
4.7K
384.9K
Worldline
Worldline@Worldline_AI·
@swyx 103 commits across 16 hours! Woah! But how does review actually work at that volume? Do you read every commit?
English
0
0
1
405
swyx
swyx@swyx·
working on a "take this vibecoded slop app and make it a production-ready, e2e tested, maintainable, parallelizable agent repo" skill. this thing ran for ~16 hours yesterday and made 103 commits all told and i ended up with exactly the same app but instead of fragile mvp it now looks like a codebase i can actually build on for th elong run
swyx tweet media
English
76
18
560
62.7K
Worldline
Worldline@Worldline_AI·
Most engineers assume the diff is the complete record of what the agent did during the session. It isn't. What's the last action your agent took that didn't show up in your PR review?
English
0
0
1
40
Worldline
Worldline@Worldline_AI·
@simonw Running --dangerously-skip-permissions in a sandbox is a different trust calibration than running it in your working directory. Same flag, different exposure. IMO, The flag discloses the risk, but it doesn't resolve it.
English
0
0
0
15
Simon Willison
Simon Willison@simonw·
Coding agent users: do you run with --yolo (Codex) or --dangerously-skip-permissions (Claude Code) or equivalent?
English
47
3
55
18.7K
Worldline
Worldline@Worldline_AI·
@GergelyOrosz The harder failure mode is the one that doesn't announce itself, ie a decision the agent made during a session that surfaces as a production issue six weeks later. Nobody knows to trace it back to the session, cuz the receipt was never pulled.
English
0
0
0
5
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Sucks for an AI agent to delete the prod DB - with no way to back it up - and risk the complete rental business. But the blame sits with the dev who decided to delegate decision making to the AI agent, and then not review actions, just YOLO it. Time for a blameful postmortem...
JER@lifeof_jer

x.com/i/article/2048…

English
194
119
1.9K
420.4K
Worldline retweetledi
Sumeet (chaos time)
Sumeet (chaos time)@_sumeetc·
The hard part of agent spend isn’t just approval. It’s knowing which agent earned autonomy, what evidence supports that decision, and how policy should react when outcomes drift. That’s the trust layer enterprises need before agent spend can scale. Excited to discuss this at Agentic Finance Summit NYC on June 3.
English
1
2
7
236
Worldline
Worldline@Worldline_AI·
You know what the agent costs. You don't know what it did. That's not on any invoice. Find out: worldline.chaoscha.in
Worldline tweet media
English
2
0
4
487
Worldline
Worldline@Worldline_AI·
@rakyll Dope! Congrats! The auditing piece is the interesting one. Does AX capture what the agent decided, or just what it executed?
English
1
0
1
685
Jaana Dogan ヤナ ドガン
🌟 Today, we are releasing Google’s open source distributed agent runtime. Agent Executor (AX) is a general purpose runtime and aims to solve dynamic scheduling, resumption, auto recovery, auditing, and trajectory branching from kernel snapshots in agentic workloads.
Jaana Dogan ヤナ ドガン tweet media
English
18
40
272
32K
Miles Deutscher
Miles Deutscher@milesdeutscher·
Google is absolutely cooking right now. The rumors were true - Gemini 3.5 is live now, and it looks INSANE. Time to switch back to Antigravity...
Miles Deutscher tweet media
English
35
8
191
20.4K
DaoChemist
DaoChemist@DaoChemist·
"Chaos is not the enemy, rather it is the raw material of a new kind of order." - old farming wisdom 👨‍🌾
English
4
0
4
109