⌀ phantom.ctx

7K posts

⌀ phantom.ctx

@phantomctx

Building the home for Science 🔬

NYC🗽 Inscrit le Nisan 2021

2.9K Abonnements273 Abonnés

⌀ phantom.ctx retweeté

Mark Kretschmann@mark_k·59m

xAI has launched grok-voice-think-fast-1.0, its new flagship voice model. The release brings a major upgrade to voice AI for real business use. The system is now available through the @xai API and powers voice features in Grok across web, iOS, Android, and X. The model handles complex, multi-step tasks with ease. It manages ambiguous requests, high-volume tool calls, and precise data collection in areas such as customer support and sales. Real-time reasoning runs in the background with zero added latency, keeping conversations fast and natural. Performance stays strong even in tough conditions. It copes with background noise, heavy accents, interruptions, and quick speech while supporting more than 25 languages. These traits make it ready for global enterprise deployments.

English

1.8K

⌀ phantom.ctx retweeté

alex fazio@alxfazio·5h

5.5 uses uv instead of pip by default on a clean environment

English

2.8K

91.1K

⌀ phantom.ctx retweeté

Aakash Gupta@aakashgupta·17h

OpenAI just shipped an AI whose only job is to babysit another AI and decide if it's allowed to keep working. Codex's "auto-review" runs a separate guardian sub-agent alongside the main coding agent. Every time the main agent tries to execute a command outside the sandbox, the guardian evaluates the risk in context and either approves it or escalates to you. The human never sees the routine approvals. You only get pulled in when the guardian flags something genuinely dangerous. The problem this solves is approval fatigue, and every coding agent has it. Claude Code, Cursor, Windsurf. The more capable the agent, the more actions it needs to take outside the sandbox, the more it interrupts you. Studies on alert fatigue in medicine show the pattern: the 47th approval prompt gets the same reflexive "yes" as the 3rd, regardless of risk. The safety mechanism stops being safe the moment humans stop reading it. Claude Code attacked this from the other direction. Trust settings, auto-accept modes, permission scoping. You configure the guardrails up front, and the agent runs within them. OpenAI's approach is different: the guardrails are themselves an AI that evaluates each action against what the agent is actually trying to accomplish. Both approaches reveal the same constraint. The human-in-the-loop was always a temporary architecture. At 5 approvals per session, a human adds genuine safety. At 40 approvals per session, a human adds latency and rubber stamps. The math on human attention doesn't scale with agent capability. This is where all of agentic coding lands within 12 months. The agent does the work. A second system reviews the work. The human reviews the output. Three layers. And the one shrinking fastest is the human layer in the middle.

OpenAI Developers@OpenAIDevs

Auto-review is a new mode that lets Codex work longer with fewer approvals and safer execution. It helps Codex keep moving through tests, builds, and more, including during long tasks and automations, while a separate agent checks higher-risk steps in context before they run.

English

137

16.8K

⌀ phantom.ctx retweeté

spicylemonade@spicey_lemonade·10h

GPT 5.5 pro might be the first model to legitimately resolve IMO P6. details: GPT 5.5 pro and Claude 4.7 opus are the only models capable of resolving IMO P6. Given Claude 4.7 was able to resolve it, I was more inclined to believe this is due to contamination. However, the model says its knowledge cutoff is August 2025 while Claude's is Jan 2026. IMO 2025 ended at the end of July, so with an august knowledge cutoff of early August, the answers likely did not make it in. Furthermore, I asked 5.5 about the biggest news headlines of August 2025 and it didn't seem to remember anything early august, so the cutoff may be right at the end of July. An Argument for GPT 5.5 pro is that it does not mention the IMO problem anywhere in its thinking. Opus 4.7 mentions the exact problem: @j_dekoninck @SebastienBubeck

English

204

17.2K

⌀ phantom.ctx retweeté

Sebastian Aaltonen@SebAaltonen·7h

Pro tip: Don't buy extra tokens for your $200/month Codex 5.5 plan. Have a personal $20/month plan. Switch to it during the 2 hour cooldown periods. I am using latest xhigh model, and never bought tokens and never waited. And I do big refactorings regularly.

English

270

17.9K

⌀ phantom.ctx retweeté

Antoine v.d. SwiftLee @twannl·7h

Learn about App Intent Driven Development avanderlee.com/swift/app-inte… ♽ Reusable and well-architectured code 🔮 Prepare for the future ✨ Use intents inside your primary app #swiftlang #iosdev

English

1.3K

⌀ phantom.ctx retweeté

Simon Smith@_simonsmith·1d

Codex can't add to or edit Figma files the way Claude can, I think I've discovered why, and it's a fixable bug. The issue is that while Codex and Claude can read Figma files via the Figma connector, Codex can't add to or edit Figma files. This is strange, because Claude can do it, and does it well. I've traced the issue to a mismatch between actions the Figma plugin DOCUMENTS and actions the ChatGPT Figma connector (at least in Enterprise) EXPOSES. The plugin tells Codex to use use_figma, create_new_file, and search_design_system. These actions are not available in the ChatGPT Figma connector to approve! They are available in the Claude Figma connector. I'm not sure if the ChatGPT/Codex team or Figma need to address this, but right now it's a glaring gap for me between Codex and Claude that inhibits workflows that include Figma.

English

⌀ phantom.ctx retweeté

Tom Turney@no_stp_on_snek·1d

Native Swift/Metal backend for vLLM on Apple Silicon. No Python in the inference hot path → better throughput + scaling. Try it: brew tap TheTom/tap && brew install vllm-swift Looking for beta testers → github.com/TheTom/vllm-sw…

English

469

41.1K

⌀ phantom.ctx retweeté

nico@nicochristie·16h

We have been testing GPT 5.5 on the hardest spreadsheet tasks in the world (100k-1M+ cell complex models). It is the Pareto frontier for spreadsheets -- SOTA accuracy, the fastest and the most efficient public model across effort levels. OAI really cooked here

English

971

65.2K

⌀ phantom.ctx retweeté

Sherwin Wu@sherwinwu·18h

Set Codex to this and never look back. Medium reasoning effort is good enough for me for ~anything I need to do now.

English

1.5K

100.7K

⌀ phantom.ctx retweeté

signüll@signulll·20h

openai's product execution & velocity has stepped up noticeably, & the tone feels more human again. it felt corporate for a while in the middle. w/ the recent releases incl 5.5 you can feel the real focus & polish showing through again. credit where it's due cuz the work on agents & codex is game changing stuff for the broader economy. comms feels tighter & again much more relatable. something changed. they clearly took the feedback to heart. begs the question, is openai pivoting away from consumer stuff? or at least it's p1 instead of p0 now. that would be a big shift.

Sam Altman@sama

These are cool! I think most companies will want to use them.

English

821

68.2K

⌀ phantom.ctx retweeté

Mohammad Azam@azamsharp·19h

Stop Fighting SwiftUI Sheets. Use This Pattern Instead azamsharp.com/2024/08/18/glo… #iosdev #swiftui

English

926

⌀ phantom.ctx retweeté

jason liu@jxnlco·19h

As models get smarter, contradictions in your codebase and prompts are becoming more expensive The more contradictions there are, the more the model needs a reason to identify under what circumstances the rules you lay out make sense. It's kind of like when I talk to my girlfriend. You said this, but you also said that, so I don't know what to do in this situation ..

English

130

6.9K

⌀ phantom.ctx retweeté

NVIDIA@nvidia·20h

Efficiency isn't just about speed anymore — it's about the massive reduction in the cost of intelligence. NVIDIA and @OpenAI's partnership leverages the GB200 NVL72 to deliver a 35x reduction in token costs, bringing enterprise-grade AI to an unprecedented scale. Trained and served on NVIDIA GB200 NVL72 systems, GPT-5.5 delivers the sustained performance required for execution-heavy, multi-step work — and at NVIDIA, that means teams are now scaling human ingenuity with OpenAI Codex Agents.

English

142

1.5K

66.2K

⌀ phantom.ctx retweeté

Claude@claudeai·19h

Memory on Claude Managed Agents is now in public beta. Your agents can now learn from every session, using an intelligence-optimized memory layer that balances performance with flexibility.

English

271

525

7.7K

395.3K

⌀ phantom.ctx retweeté

Jonas@JonasBadalic·20h

Pleased to announce that we've made @sentry slightly denser, and cleaned up the layout.

English

5.1K

⌀ phantom.ctx retweeté

Max Weinbach@mweinbach·20h

This feels like the fastest new model rollout from OpenAI

English

169

6.8K

⌀ phantom.ctx retweeté

Parallel Web Systems@p0·21h

The best web search for agents is now free. Upgrade to Parallel's web search tools in any MCP-supported tool or agent, for free, in under 60 seconds. No account. No API keys. Zero cost. docs.parallel.ai/integrations/m…

GIF

English

279

140.3K

⌀ phantom.ctx retweeté

Lenny Rachitsky@lennysan·20h

Claude Code's Head of Product: "The hardest PM skill right now is how to be the right amount of AGI-pilled."

Lenny Rachitsky@lennysan

How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg

English

839

159.3K

⌀ phantom.ctx retweeté

Dan McAteer@daniel_mac8·20h

GPT-5.5 beats Opus 4.7 on several benchmarks, esp those related to agentic coding + tool calling. It's also pretty damn close to Claude Mythos... Even beats Mythos on Terminal-Bench 2.0. However, GPT-5.5 is far more token efficient than Opus 4.7. OpenAI cooked this Spud 🥔.

English

187

8.2K

Découvrir

@xai @j_dekoninck @SebastienBubeck @OpenAI @sentry @elonmusk @BarackObama @taylorswift13