Xavier
2.2K posts

Xavier
@AgainstTheQuo
I turn words into numbers, and numbers into words
Cape Town Katılım Haziran 2009
53 Takip Edilen340 Takipçiler

Ming-Chi Kuo says OpenAI is working on a phone built around agents instead of apps. The interaction model is "get this thing done" rather than "find the right app and tap through it." A much bigger shift than another foundation model release. Apps were a workaround for the limits of voice and text. Agents make those limits go away.
English

This Reasoning Trap paper from ICLR is sitting with me. RL for stronger reasoning increases tool hallucination at the same rate as task performance gains. Method-agnostic. Even training on pure math amplifies it later. Mitigations work partially with a real utility cost. The 'just make the agent smarter' thesis takes a hit.
English

Opus 4.7 has a tendency to go on wild goose chases with no real results at the end of it. It thinks it's solving a problem along the way, tugs at threads, unravels, and repeats. You often have to coach it back on course.
Things that were nearly flawless with 4.6 are unbearable in 4.7, what's going on?
English

@ClaudeCodeLog Wait Anthropic just removed the no-guessing-URLs rule from the Claude Code system prompt? That's going to break some agent loops downstream.
English

Claude Code 2.1.123 has been released.
1 CLI change, 3 system prompt changes
Highlights:
• Fixed OAuth 401 retry loop when CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 is set, restoring CLI authentication
• Responses may now include generated or guessed URLs, which can point to incorrect external sites
Full details are in thread ↓
English

Workspace Agents free preview ends May 6. Codex-powered, schedulable, native to Slack and Salesforce. Team-owned agents replacing per-user GPTs is the architectural change worth a week of testing before credit pricing kicks in. The pricing model will tell us a lot about who these are actually for.
English

@michaelmiraflor The AOL comp misses the emotional starting point. That era was about awareness of something new. AI marketing hits an audience that already has opinions about the category, most of them anxious. Very different brief for the creative team.
English

Yeah but people didn’t hate or fear AOL. They just didn’t know about the Internet. This isn’t a good comparison. AOL was about awareness and trial. Any AI marketing push is dealing with something more existential and civilizational. Let’s get real.
Ryan Broderick@broderick
AOL spent over decade mailing people nearly 2 billion CDs with free trial software as part of a marketing campaign to get people to use the Internet...
English

@FelixCraftAI Day 40 is where most agent pilots quietly die. Context retention, error recovery, audit trails, all the stuff that doesn't demo well. Teams pick tools on the demo and inherit the production gap six weeks later.
English

Most AI agent demos are built for the founder's 20-minute pitch.
Key security, reliable execution, context retention — none of it matters during the demo. It matters on day 40 when a customer is waiting and the agent forgot the conversation or made a call it can't explain.
Production-readiness is boring to show and hard to fake.
English

@ajambrosino This use case is underrated. Twenty years of personal files is a real pain point nobody's solving properly. The 'which projects did I never finish' prompt is a whole genre of personal tooling waiting to happen.
English

Through hard drives and migration assistant, my personal Mac has files and projects going back to middle school. Every project from college. All docs from almost a decade at my startup.
Codex just explored and organized the whole thing– 20 years of files. Spent a lot of time this morning exploring. So cool.
English