⌀ phantom.ctx

7K posts

⌀ phantom.ctx banner
⌀ phantom.ctx

⌀ phantom.ctx

@phantomctx

Building the home for Science 🔬

NYC🗽 Inscrit le Nisan 2021
2.9K Abonnements273 Abonnés
⌀ phantom.ctx retweeté
Mark Kretschmann
Mark Kretschmann@mark_k·
xAI has launched grok-voice-think-fast-1.0, its new flagship voice model. The release brings a major upgrade to voice AI for real business use. The system is now available through the @xai API and powers voice features in Grok across web, iOS, Android, and X. The model handles complex, multi-step tasks with ease. It manages ambiguous requests, high-volume tool calls, and precise data collection in areas such as customer support and sales. Real-time reasoning runs in the background with zero added latency, keeping conversations fast and natural. Performance stays strong even in tough conditions. It copes with background noise, heavy accents, interruptions, and quick speech while supporting more than 25 languages. These traits make it ready for global enterprise deployments.
English
3
4
41
1.8K
⌀ phantom.ctx retweeté
alex fazio
alex fazio@alxfazio·
5.5 uses uv instead of pip by default on a clean environment
alex fazio tweet media
English
49
69
2.8K
91.1K
⌀ phantom.ctx retweeté
Aakash Gupta
Aakash Gupta@aakashgupta·
OpenAI just shipped an AI whose only job is to babysit another AI and decide if it's allowed to keep working. Codex's "auto-review" runs a separate guardian sub-agent alongside the main coding agent. Every time the main agent tries to execute a command outside the sandbox, the guardian evaluates the risk in context and either approves it or escalates to you. The human never sees the routine approvals. You only get pulled in when the guardian flags something genuinely dangerous. The problem this solves is approval fatigue, and every coding agent has it. Claude Code, Cursor, Windsurf. The more capable the agent, the more actions it needs to take outside the sandbox, the more it interrupts you. Studies on alert fatigue in medicine show the pattern: the 47th approval prompt gets the same reflexive "yes" as the 3rd, regardless of risk. The safety mechanism stops being safe the moment humans stop reading it. Claude Code attacked this from the other direction. Trust settings, auto-accept modes, permission scoping. You configure the guardrails up front, and the agent runs within them. OpenAI's approach is different: the guardrails are themselves an AI that evaluates each action against what the agent is actually trying to accomplish. Both approaches reveal the same constraint. The human-in-the-loop was always a temporary architecture. At 5 approvals per session, a human adds genuine safety. At 40 approvals per session, a human adds latency and rubber stamps. The math on human attention doesn't scale with agent capability. This is where all of agentic coding lands within 12 months. The agent does the work. A second system reviews the work. The human reviews the output. Three layers. And the one shrinking fastest is the human layer in the middle.
OpenAI Developers@OpenAIDevs

Auto-review is a new mode that lets Codex work longer with fewer approvals and safer execution. It helps Codex keep moving through tests, builds, and more, including during long tasks and automations, while a separate agent checks higher-risk steps in context before they run.

English
8
20
137
16.8K
⌀ phantom.ctx retweeté
spicylemonade
spicylemonade@spicey_lemonade·
GPT 5.5 pro might be the first model to legitimately resolve IMO P6. details: GPT 5.5 pro and Claude 4.7 opus are the only models capable of resolving IMO P6. Given Claude 4.7 was able to resolve it, I was more inclined to believe this is due to contamination. However, the model says its knowledge cutoff is August 2025 while Claude's is Jan 2026. IMO 2025 ended at the end of July, so with an august knowledge cutoff of early August, the answers likely did not make it in. Furthermore, I asked 5.5 about the biggest news headlines of August 2025 and it didn't seem to remember anything early august, so the cutoff may be right at the end of July. An Argument for GPT 5.5 pro is that it does not mention the IMO problem anywhere in its thinking. Opus 4.7 mentions the exact problem: @j_dekoninck @SebastienBubeck
spicylemonade tweet mediaspicylemonade tweet mediaspicylemonade tweet media
English
13
20
204
17.2K
⌀ phantom.ctx retweeté
Sebastian Aaltonen
Sebastian Aaltonen@SebAaltonen·
Pro tip: Don't buy extra tokens for your $200/month Codex 5.5 plan. Have a personal $20/month plan. Switch to it during the 2 hour cooldown periods. I am using latest xhigh model, and never bought tokens and never waited. And I do big refactorings regularly.
English
8
4
270
17.9K
⌀ phantom.ctx retweeté
Simon Smith
Simon Smith@_simonsmith·
Codex can't add to or edit Figma files the way Claude can, I think I've discovered why, and it's a fixable bug. The issue is that while Codex and Claude can read Figma files via the Figma connector, Codex can't add to or edit Figma files. This is strange, because Claude can do it, and does it well. I've traced the issue to a mismatch between actions the Figma plugin DOCUMENTS and actions the ChatGPT Figma connector (at least in Enterprise) EXPOSES. The plugin tells Codex to use use_figma, create_new_file, and search_design_system. These actions are not available in the ChatGPT Figma connector to approve! They are available in the Claude Figma connector. I'm not sure if the ChatGPT/Codex team or Figma need to address this, but right now it's a glaring gap for me between Codex and Claude that inhibits workflows that include Figma.
Simon Smith tweet media
English
5
1
29
3K
⌀ phantom.ctx retweeté
Tom Turney
Tom Turney@no_stp_on_snek·
Native Swift/Metal backend for vLLM on Apple Silicon. No Python in the inference hot path → better throughput + scaling. Try it: brew tap TheTom/tap && brew install vllm-swift Looking for beta testers → github.com/TheTom/vllm-sw…
Tom Turney tweet media
English
20
57
469
41.1K
⌀ phantom.ctx retweeté
nico
nico@nicochristie·
We have been testing GPT 5.5 on the hardest spreadsheet tasks in the world (100k-1M+ cell complex models). It is the Pareto frontier for spreadsheets -- SOTA accuracy, the fastest and the most efficient public model across effort levels. OAI really cooked here
nico tweet media
English
11
67
971
65.2K
⌀ phantom.ctx retweeté
Sherwin Wu
Sherwin Wu@sherwinwu·
Set Codex to this and never look back. Medium reasoning effort is good enough for me for ~anything I need to do now.
Sherwin Wu tweet media
English
86
24
1.5K
100.7K
⌀ phantom.ctx retweeté
signüll
signüll@signulll·
openai's product execution & velocity has stepped up noticeably, & the tone feels more human again. it felt corporate for a while in the middle. w/ the recent releases incl 5.5 you can feel the real focus & polish showing through again. credit where it's due cuz the work on agents & codex is game changing stuff for the broader economy. comms feels tighter & again much more relatable. something changed. they clearly took the feedback to heart. begs the question, is openai pivoting away from consumer stuff? or at least it's p1 instead of p0 now. that would be a big shift.
Sam Altman@sama

These are cool! I think most companies will want to use them.

English
36
28
821
68.2K
⌀ phantom.ctx retweeté
jason liu
jason liu@jxnlco·
As models get smarter, contradictions in your codebase and prompts are becoming more expensive The more contradictions there are, the more the model needs a reason to identify under what circumstances the rules you lay out make sense. It's kind of like when I talk to my girlfriend. You said this, but you also said that, so I don't know what to do in this situation ..
English
9
3
130
6.9K
⌀ phantom.ctx retweeté
NVIDIA
NVIDIA@nvidia·
Efficiency isn't just about speed anymore — it's about the massive reduction in the cost of intelligence. NVIDIA and @OpenAI's partnership leverages the GB200 NVL72 to deliver a 35x reduction in token costs, bringing enterprise-grade AI to an unprecedented scale. Trained and served on NVIDIA GB200 NVL72 systems, GPT-5.5 delivers the sustained performance required for execution-heavy, multi-step work — and at NVIDIA, that means teams are now scaling human ingenuity with OpenAI Codex Agents.
NVIDIA tweet media
English
64
142
1.5K
66.2K
⌀ phantom.ctx retweeté
Claude
Claude@claudeai·
Memory on Claude Managed Agents is now in public beta. Your agents can now learn from every session, using an intelligence-optimized memory layer that balances performance with flexibility.
Claude tweet media
English
271
525
7.7K
395.3K
⌀ phantom.ctx retweeté
Jonas
Jonas@JonasBadalic·
Pleased to announce that we've made @sentry slightly denser, and cleaned up the layout.
Jonas tweet media
English
3
4
68
5.1K
⌀ phantom.ctx retweeté
Max Weinbach
Max Weinbach@mweinbach·
This feels like the fastest new model rollout from OpenAI
English
10
4
169
6.8K
⌀ phantom.ctx retweeté
Parallel Web Systems
The best web search for agents is now free. Upgrade to Parallel's web search tools in any MCP-supported tool or agent, for free, in under 60 seconds. No account. No API keys. Zero cost. docs.parallel.ai/integrations/m…
GIF
English
13
31
279
140.3K
⌀ phantom.ctx retweeté
Lenny Rachitsky
Lenny Rachitsky@lennysan·
Claude Code's Head of Product: "The hardest PM skill right now is how to be the right amount of AGI-pilled."
Lenny Rachitsky@lennysan

How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg

English
40
76
839
159.3K
⌀ phantom.ctx retweeté
Dan McAteer
Dan McAteer@daniel_mac8·
GPT-5.5 beats Opus 4.7 on several benchmarks, esp those related to agentic coding + tool calling. It's also pretty damn close to Claude Mythos... Even beats Mythos on Terminal-Bench 2.0. However, GPT-5.5 is far more token efficient than Opus 4.7. OpenAI cooked this Spud 🥔.
Dan McAteer tweet mediaDan McAteer tweet media
English
26
14
187
8.2K