⌀ phantom.ctx

7K posts

⌀ phantom.ctx banner
⌀ phantom.ctx

⌀ phantom.ctx

@phantomctx

Building the home for Science 🔬

NYC🗽 انضم Nisan 2021
2.9K يتبع273 المتابعون
⌀ phantom.ctx أُعيد تغريده
Aakash Gupta
Aakash Gupta@aakashgupta·
OpenAI just shipped an AI whose only job is to babysit another AI and decide if it's allowed to keep working. Codex's "auto-review" runs a separate guardian sub-agent alongside the main coding agent. Every time the main agent tries to execute a command outside the sandbox, the guardian evaluates the risk in context and either approves it or escalates to you. The human never sees the routine approvals. You only get pulled in when the guardian flags something genuinely dangerous. The problem this solves is approval fatigue, and every coding agent has it. Claude Code, Cursor, Windsurf. The more capable the agent, the more actions it needs to take outside the sandbox, the more it interrupts you. Studies on alert fatigue in medicine show the pattern: the 47th approval prompt gets the same reflexive "yes" as the 3rd, regardless of risk. The safety mechanism stops being safe the moment humans stop reading it. Claude Code attacked this from the other direction. Trust settings, auto-accept modes, permission scoping. You configure the guardrails up front, and the agent runs within them. OpenAI's approach is different: the guardrails are themselves an AI that evaluates each action against what the agent is actually trying to accomplish. Both approaches reveal the same constraint. The human-in-the-loop was always a temporary architecture. At 5 approvals per session, a human adds genuine safety. At 40 approvals per session, a human adds latency and rubber stamps. The math on human attention doesn't scale with agent capability. This is where all of agentic coding lands within 12 months. The agent does the work. A second system reviews the work. The human reviews the output. Three layers. And the one shrinking fastest is the human layer in the middle.
OpenAI Developers@OpenAIDevs

Auto-review is a new mode that lets Codex work longer with fewer approvals and safer execution. It helps Codex keep moving through tests, builds, and more, including during long tasks and automations, while a separate agent checks higher-risk steps in context before they run.

English
7
18
108
13.7K
⌀ phantom.ctx أُعيد تغريده
spicylemonade
spicylemonade@spicey_lemonade·
GPT 5.5 pro might be the first model to legitimately resolve IMO P6. details: GPT 5.5 pro and Claude 4.7 opus are the only models capable of resolving IMO P6. Given Claude 4.7 was able to resolve it, I was more inclined to believe this is due to contamination. However, the model says its knowledge cutoff is August 2025 while Claude's is Jan 2026. IMO 2025 ended at the end of July, so with an august knowledge cutoff of early August, the answers likely did not make it in. Furthermore, I asked 5.5 about the biggest news headlines of August 2025 and it didn't seem to remember anything early august, so the cutoff may be right at the end of July. An Argument for GPT 5.5 pro is that it does not mention the IMO problem anywhere in its thinking. Opus 4.7 mentions the exact problem: @j_dekoninck @SebastienBubeck
spicylemonade tweet mediaspicylemonade tweet mediaspicylemonade tweet media
English
10
11
130
8.8K
⌀ phantom.ctx أُعيد تغريده
Sebastian Aaltonen
Sebastian Aaltonen@SebAaltonen·
Pro tip: Don't buy extra tokens for your $200/month Codex 5.5 plan. Have a personal $20/month plan. Switch to it during the 2 hour cooldown periods. I am using latest xhigh model, and never bought tokens and never waited. And I do big refactorings regularly.
English
6
3
119
7.6K
⌀ phantom.ctx أُعيد تغريده
Simon Smith
Simon Smith@_simonsmith·
Codex can't add to or edit Figma files the way Claude can, I think I've discovered why, and it's a fixable bug. The issue is that while Codex and Claude can read Figma files via the Figma connector, Codex can't add to or edit Figma files. This is strange, because Claude can do it, and does it well. I've traced the issue to a mismatch between actions the Figma plugin DOCUMENTS and actions the ChatGPT Figma connector (at least in Enterprise) EXPOSES. The plugin tells Codex to use use_figma, create_new_file, and search_design_system. These actions are not available in the ChatGPT Figma connector to approve! They are available in the Claude Figma connector. I'm not sure if the ChatGPT/Codex team or Figma need to address this, but right now it's a glaring gap for me between Codex and Claude that inhibits workflows that include Figma.
Simon Smith tweet media
English
5
1
25
2.4K
⌀ phantom.ctx أُعيد تغريده
Tom Turney
Tom Turney@no_stp_on_snek·
Native Swift/Metal backend for vLLM on Apple Silicon. No Python in the inference hot path → better throughput + scaling. Try it: brew tap TheTom/tap && brew install vllm-swift Looking for beta testers → github.com/TheTom/vllm-sw…
Tom Turney tweet media
English
18
49
417
27.9K
⌀ phantom.ctx أُعيد تغريده
nico
nico@nicochristie·
We have been testing GPT 5.5 on the hardest spreadsheet tasks in the world (100k-1M+ cell complex models). It is the Pareto frontier for spreadsheets -- SOTA accuracy, the fastest and the most efficient public model across effort levels. OAI really cooked here
nico tweet media
English
10
55
773
51.2K
⌀ phantom.ctx أُعيد تغريده
Sherwin Wu
Sherwin Wu@sherwinwu·
Set Codex to this and never look back. Medium reasoning effort is good enough for me for ~anything I need to do now.
Sherwin Wu tweet media
English
75
19
1.3K
84.1K
⌀ phantom.ctx أُعيد تغريده
signüll
signüll@signulll·
openai's product execution & velocity has stepped up noticeably, & the tone feels more human again. it felt corporate for a while in the middle. w/ the recent releases incl 5.5 you can feel the real focus & polish showing through again. credit where it's due cuz the work on agents & codex is game changing stuff for the broader economy. comms feels tighter & again much more relatable. something changed. they clearly took the feedback to heart. begs the question, is openai pivoting away from consumer stuff? or at least it's p1 instead of p0 now. that would be a big shift.
Sam Altman@sama

These are cool! I think most companies will want to use them.

English
33
26
744
62K
⌀ phantom.ctx أُعيد تغريده
jason liu
jason liu@jxnlco·
As models get smarter, contradictions in your codebase and prompts are becoming more expensive The more contradictions there are, the more the model needs a reason to identify under what circumstances the rules you lay out make sense. It's kind of like when I talk to my girlfriend. You said this, but you also said that, so I don't know what to do in this situation ..
English
9
3
124
6.4K
⌀ phantom.ctx أُعيد تغريده
NVIDIA
NVIDIA@nvidia·
Efficiency isn't just about speed anymore — it's about the massive reduction in the cost of intelligence. NVIDIA and @OpenAI's partnership leverages the GB200 NVL72 to deliver a 35x reduction in token costs, bringing enterprise-grade AI to an unprecedented scale. Trained and served on NVIDIA GB200 NVL72 systems, GPT-5.5 delivers the sustained performance required for execution-heavy, multi-step work — and at NVIDIA, that means teams are now scaling human ingenuity with OpenAI Codex Agents.
NVIDIA tweet media
English
52
117
1.2K
50.5K
⌀ phantom.ctx أُعيد تغريده
Claude
Claude@claudeai·
Memory on Claude Managed Agents is now in public beta. Your agents can now learn from every session, using an intelligence-optimized memory layer that balances performance with flexibility.
Claude tweet media
English
250
478
7.1K
347.4K
⌀ phantom.ctx أُعيد تغريده
Jonas
Jonas@JonasBadalic·
Pleased to announce that we've made @sentry slightly denser, and cleaned up the layout.
Jonas tweet media
English
3
4
53
3.8K
⌀ phantom.ctx أُعيد تغريده
Max Weinbach
Max Weinbach@mweinbach·
This feels like the fastest new model rollout from OpenAI
English
10
4
159
6.5K
⌀ phantom.ctx أُعيد تغريده
Parallel Web Systems
The best web search for agents is now free. Upgrade to Parallel's web search tools in any MCP-supported tool or agent, for free, in under 60 seconds. No account. No API keys. Zero cost. docs.parallel.ai/integrations/m…
GIF
English
11
27
246
119.4K
⌀ phantom.ctx أُعيد تغريده
Lenny Rachitsky
Lenny Rachitsky@lennysan·
Claude Code's Head of Product: "The hardest PM skill right now is how to be the right amount of AGI-pilled."
Lenny Rachitsky@lennysan

How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg

English
39
67
737
141.8K
⌀ phantom.ctx أُعيد تغريده
Dan McAteer
Dan McAteer@daniel_mac8·
GPT-5.5 beats Opus 4.7 on several benchmarks, esp those related to agentic coding + tool calling. It's also pretty damn close to Claude Mythos... Even beats Mythos on Terminal-Bench 2.0. However, GPT-5.5 is far more token efficient than Opus 4.7. OpenAI cooked this Spud 🥔.
Dan McAteer tweet mediaDan McAteer tweet media
English
26
11
173
7.8K
⌀ phantom.ctx أُعيد تغريده
Laura Sandoval
Laura Sandoval@laurasideral·
Introducing a new way to manage your Notion agents in the Notion AI beta 💬 We want to make it easier for you to track your agents’ activity and course-correct when needed—so we’re bringing it to the forefront. Let us know what you think! Join the beta: testflight.apple.com/join/m2kxP5cw
English
5
11
294
21.5K