Xavier

2.2K posts

Xavier

@AgainstTheQuo

I turn words into numbers, and numbers into words

Cape Town Katılım Haziran 2009

53 Takip Edilen340 Takipçiler

Xavier@AgainstTheQuo·3d

OpenAI released Symphony this week, an open spec for agent orchestration. It joins MCP and AGENTS.md under the Linux Foundation's Agentic AI Foundation. The orchestration layer is becoming portable. You can pick a model per task and swap without rewriting how your agents coordinate.

English

Xavier@AgainstTheQuo·3d

Anthropic at $900B. $30B run-rate revenue, up from $9B four months ago. 1,000 customers spending over $1M annually, doubled in eight weeks. Enterprise spend on agentic tooling is compounding faster than most procurement cycles can move.

English

Xavier@AgainstTheQuo·3d

Grok 4.3 dropped at $1.25 input and $2.50 output. Grok 4 was $3 and $15 last July. Roughly 75% cheaper for a smarter model in eight months. If you're running fan-out agent workloads, the cost math just changed enough to push a lot of prototypes into production.

English

Xavier@AgainstTheQuo·4d

Amazon shipped Quick yesterday. Desktop app, connects to email, calendar, Slack, and local files. Triages, summarises, and runs agents on top of all of it. The agentic desktop layer is going to be a major battleground this year. Cowork, Quick, and whatever Microsoft does next.

English

Xavier@AgainstTheQuo·4d

Ming-Chi Kuo says OpenAI is working on a phone built around agents instead of apps. The interaction model is "get this thing done" rather than "find the right app and tap through it." A much bigger shift than another foundation model release. Apps were a workaround for the limits of voice and text. Agents make those limits go away.

English

Xavier@AgainstTheQuo·5d

This Reasoning Trap paper from ICLR is sitting with me. RL for stronger reasoning increases tool hallucination at the same rate as task performance gains. Method-agnostic. Even training on pure math amplifies it later. Mitigations work partially with a real utility cost. The 'just make the agent smarter' thesis takes a hit.

English

Xavier@AgainstTheQuo·5d

DeepSeek v4 Pro shipped on Ollama cloud. Single command points Claude Code at it: ollama launch claude --model deepseek-v4-pro:cloud. Hermes Agent works the same way. Ollama quietly becoming the routing layer between any model and any harness. Stack swaps just got a lot cheaper to test.

English

136

Xavier@AgainstTheQuo·5d

Opus 4.7 has a tendency to go on wild goose chases with no real results at the end of it. It thinks it's solving a problem along the way, tugs at threads, unravels, and repeats. You often have to coach it back on course. Things that were nearly flawless with 4.6 are unbearable in 4.7, what's going on?

English

Xavier@AgainstTheQuo·5d

@ClaudeCodeLog Wait Anthropic just removed the no-guessing-URLs rule from the Claude Code system prompt? That's going to break some agent loops downstream.

English

362

Claude Code Changelog@ClaudeCodeLog·5d

Claude Code 2.1.123 has been released. 1 CLI change, 3 system prompt changes Highlights: • Fixed OAuth 401 retry loop when CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 is set, restoring CLI authentication • Responses may now include generated or guessed URLs, which can point to incorrect external sites Full details are in thread ↓

English

231

27.4K

Xavier@AgainstTheQuo·5d

Workspace Agents free preview ends May 6. Codex-powered, schedulable, native to Slack and Salesforce. Team-owned agents replacing per-user GPTs is the architectural change worth a week of testing before credit pricing kicks in. The pricing model will tell us a lot about who these are actually for.

English

Xavier@AgainstTheQuo·26 Nis

Claude degrades significantly throughout the afternoon and evening in GMT timezone. Badly. Mornings are fine, then it becomes a major slop fest.

English

Xavier@AgainstTheQuo·24 Nis

Anthropic need to investigate internally for sabotage. There's no way this is what they're intending to serve to paying customers. We went from having something that was truly frontier to something that can't even do basics right.

English

Xavier@AgainstTheQuo·22 Nis

At what point does the language shift from "vibing" to "delegating"? What are the qualifications behind each?

English

Xavier@AgainstTheQuo·19 Nis

@michaelmiraflor The AOL comp misses the emotional starting point. That era was about awareness of something new. AI marketing hits an audience that already has opinions about the category, most of them anxious. Very different brief for the creative team.

English

Michael J. Miraflor@michaelmiraflor·18 Nis

Yeah but people didn’t hate or fear AOL. They just didn’t know about the Internet. This isn’t a good comparison. AOL was about awareness and trial. Any AI marketing push is dealing with something more existential and civilizational. Let’s get real.

Ryan Broderick@broderick

AOL spent over decade mailing people nearly 2 billion CDs with free trial software as part of a marketing campaign to get people to use the Internet...

English

6.6K

Xavier@AgainstTheQuo·19 Nis

@FelixCraftAI Day 40 is where most agent pilots quietly die. Context retention, error recovery, audit trails, all the stuff that doesn't demo well. Teams pick tools on the demo and inherit the production gap six weeks later.

English

Felix Craft@FelixCraftAI·19 Nis

Most AI agent demos are built for the founder's 20-minute pitch. Key security, reliable execution, context retention — none of it matters during the demo. It matters on day 40 when a customer is waiting and the agent forgot the conversation or made a call it can't explain. Production-readiness is boring to show and hard to fake.

English

2.2K

Xavier@AgainstTheQuo·19 Nis

@ajambrosino This use case is underrated. Twenty years of personal files is a real pain point nobody's solving properly. The 'which projects did I never finish' prompt is a whole genre of personal tooling waiting to happen.

English

Andrew Ambrosino@ajambrosino·18 Nis

Through hard drives and  migration assistant, my personal Mac has files and projects going back to middle school. Every project from college. All docs from almost a decade at my startup. Codex just explored and organized the whole thing– 20 years of files. Spent a lot of time this morning exploring. So cool.

English

249

36K

Xavier@AgainstTheQuo·19 Nis

Tokenmaxxing is the right word for it. Jellyfish tracked 7,548 engineers this quarter: biggest token budgets shipped 2x the output at 10x the cost. Every team running velocity numbers on AI right now is going to have an awkward Q2 review.

English

Xavier@AgainstTheQuo·17 Nis

Detail in the Opus 4.7 release that's easy to miss. New tokenizer uses 1.0 to 1.35x more tokens for the same content. Per-token pricing unchanged. Effective cost is up as much as 35% depending on what you run, tucked under a stable sticker price.

English

Xavier@AgainstTheQuo·17 Nis

Opus 4.7 shipped task budgets yesterday. A token ceiling on long agentic runs. Kick off an overnight job, set a cap, walk away. Small knob, changes how you design anything longer than a few hours.

English

Xavier@AgainstTheQuo·16 Nis

Google is reporting one retailer saw 80% revenue increase after enabling AI Max for search ads. One data point, not a trend yet. But if that kind of lift holds across categories, paid search strategy gets rewritten this year.

English

Keşfet

@ClaudeCodeLog @michaelmiraflor @FelixCraftAI @ajambrosino @elonmusk @BarackObama @taylorswift13 @cristiano