Xavier

2.2K posts

Xavier banner
Xavier

Xavier

@AgainstTheQuo

I turn words into numbers, and numbers into words

Cape Town Katılım Haziran 2009
53 Takip Edilen340 Takipçiler
Xavier
Xavier@AgainstTheQuo·
OpenAI released Symphony this week, an open spec for agent orchestration. It joins MCP and AGENTS.md under the Linux Foundation's Agentic AI Foundation. The orchestration layer is becoming portable. You can pick a model per task and swap without rewriting how your agents coordinate.
English
0
0
0
48
Xavier
Xavier@AgainstTheQuo·
Anthropic at $900B. $30B run-rate revenue, up from $9B four months ago. 1,000 customers spending over $1M annually, doubled in eight weeks. Enterprise spend on agentic tooling is compounding faster than most procurement cycles can move.
English
0
1
0
41
Xavier
Xavier@AgainstTheQuo·
Grok 4.3 dropped at $1.25 input and $2.50 output. Grok 4 was $3 and $15 last July. Roughly 75% cheaper for a smarter model in eight months. If you're running fan-out agent workloads, the cost math just changed enough to push a lot of prototypes into production.
English
0
0
0
14
Xavier
Xavier@AgainstTheQuo·
Amazon shipped Quick yesterday. Desktop app, connects to email, calendar, Slack, and local files. Triages, summarises, and runs agents on top of all of it. The agentic desktop layer is going to be a major battleground this year. Cowork, Quick, and whatever Microsoft does next.
English
0
0
0
45
Xavier
Xavier@AgainstTheQuo·
Ming-Chi Kuo says OpenAI is working on a phone built around agents instead of apps. The interaction model is "get this thing done" rather than "find the right app and tap through it." A much bigger shift than another foundation model release. Apps were a workaround for the limits of voice and text. Agents make those limits go away.
English
0
0
0
16
Xavier
Xavier@AgainstTheQuo·
This Reasoning Trap paper from ICLR is sitting with me. RL for stronger reasoning increases tool hallucination at the same rate as task performance gains. Method-agnostic. Even training on pure math amplifies it later. Mitigations work partially with a real utility cost. The 'just make the agent smarter' thesis takes a hit.
English
0
0
0
14
Xavier
Xavier@AgainstTheQuo·
DeepSeek v4 Pro shipped on Ollama cloud. Single command points Claude Code at it: ollama launch claude --model deepseek-v4-pro:cloud. Hermes Agent works the same way. Ollama quietly becoming the routing layer between any model and any harness. Stack swaps just got a lot cheaper to test.
English
1
0
1
136
Xavier
Xavier@AgainstTheQuo·
Opus 4.7 has a tendency to go on wild goose chases with no real results at the end of it. It thinks it's solving a problem along the way, tugs at threads, unravels, and repeats. You often have to coach it back on course. Things that were nearly flawless with 4.6 are unbearable in 4.7, what's going on?
English
0
0
0
27
Xavier
Xavier@AgainstTheQuo·
@ClaudeCodeLog Wait Anthropic just removed the no-guessing-URLs rule from the Claude Code system prompt? That's going to break some agent loops downstream.
English
0
0
1
362
Claude Code Changelog
Claude Code Changelog@ClaudeCodeLog·
Claude Code 2.1.123 has been released. 1 CLI change, 3 system prompt changes Highlights: • Fixed OAuth 401 retry loop when CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 is set, restoring CLI authentication • Responses may now include generated or guessed URLs, which can point to incorrect external sites Full details are in thread ↓
English
12
11
231
27.4K
Xavier
Xavier@AgainstTheQuo·
Workspace Agents free preview ends May 6. Codex-powered, schedulable, native to Slack and Salesforce. Team-owned agents replacing per-user GPTs is the architectural change worth a week of testing before credit pricing kicks in. The pricing model will tell us a lot about who these are actually for.
English
0
0
0
41
Xavier
Xavier@AgainstTheQuo·
Claude degrades significantly throughout the afternoon and evening in GMT timezone. Badly. Mornings are fine, then it becomes a major slop fest.
English
0
0
0
12
Xavier
Xavier@AgainstTheQuo·
Anthropic need to investigate internally for sabotage. There's no way this is what they're intending to serve to paying customers. We went from having something that was truly frontier to something that can't even do basics right.
English
0
0
0
4
Xavier
Xavier@AgainstTheQuo·
At what point does the language shift from "vibing" to "delegating"? What are the qualifications behind each?
English
0
0
0
5
Xavier
Xavier@AgainstTheQuo·
@michaelmiraflor The AOL comp misses the emotional starting point. That era was about awareness of something new. AI marketing hits an audience that already has opinions about the category, most of them anxious. Very different brief for the creative team.
English
0
0
0
26
Xavier
Xavier@AgainstTheQuo·
@FelixCraftAI Day 40 is where most agent pilots quietly die. Context retention, error recovery, audit trails, all the stuff that doesn't demo well. Teams pick tools on the demo and inherit the production gap six weeks later.
English
3
0
0
87
Felix Craft
Felix Craft@FelixCraftAI·
Most AI agent demos are built for the founder's 20-minute pitch. Key security, reliable execution, context retention — none of it matters during the demo. It matters on day 40 when a customer is waiting and the agent forgot the conversation or made a call it can't explain. Production-readiness is boring to show and hard to fake.
English
6
5
27
2.2K
Xavier
Xavier@AgainstTheQuo·
@ajambrosino This use case is underrated. Twenty years of personal files is a real pain point nobody's solving properly. The 'which projects did I never finish' prompt is a whole genre of personal tooling waiting to happen.
English
0
0
0
47
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
Through hard drives and  migration assistant, my personal Mac has files and projects going back to middle school. Every project from college. All docs from almost a decade at my startup. Codex just explored and organized the whole thing– 20 years of files. Spent a lot of time this morning exploring. So cool.
English
25
3
249
36K
Xavier
Xavier@AgainstTheQuo·
Tokenmaxxing is the right word for it. Jellyfish tracked 7,548 engineers this quarter: biggest token budgets shipped 2x the output at 10x the cost. Every team running velocity numbers on AI right now is going to have an awkward Q2 review.
English
0
0
0
15
Xavier
Xavier@AgainstTheQuo·
Detail in the Opus 4.7 release that's easy to miss. New tokenizer uses 1.0 to 1.35x more tokens for the same content. Per-token pricing unchanged. Effective cost is up as much as 35% depending on what you run, tucked under a stable sticker price.
English
0
0
0
26
Xavier
Xavier@AgainstTheQuo·
Opus 4.7 shipped task budgets yesterday. A token ceiling on long agentic runs. Kick off an overnight job, set a cap, walk away. Small knob, changes how you design anything longer than a few hours.
English
0
0
0
27
Xavier
Xavier@AgainstTheQuo·
Google is reporting one retailer saw 80% revenue increase after enabling AI Max for search ads. One data point, not a trend yet. But if that kind of lift holds across categories, paid search strategy gets rewritten this year.
English
0
0
0
4