⌀ phantom.ctx

7K posts

⌀ phantom.ctx

@phantomctx

Building the home for Science 🔬

NYC🗽 انضم Nisan 2021

2.9K يتبع273 المتابعون

⌀ phantom.ctx أُعيد تغريده

Aakash Gupta@aakashgupta·12h

OpenAI just shipped an AI whose only job is to babysit another AI and decide if it's allowed to keep working. Codex's "auto-review" runs a separate guardian sub-agent alongside the main coding agent. Every time the main agent tries to execute a command outside the sandbox, the guardian evaluates the risk in context and either approves it or escalates to you. The human never sees the routine approvals. You only get pulled in when the guardian flags something genuinely dangerous. The problem this solves is approval fatigue, and every coding agent has it. Claude Code, Cursor, Windsurf. The more capable the agent, the more actions it needs to take outside the sandbox, the more it interrupts you. Studies on alert fatigue in medicine show the pattern: the 47th approval prompt gets the same reflexive "yes" as the 3rd, regardless of risk. The safety mechanism stops being safe the moment humans stop reading it. Claude Code attacked this from the other direction. Trust settings, auto-accept modes, permission scoping. You configure the guardrails up front, and the agent runs within them. OpenAI's approach is different: the guardrails are themselves an AI that evaluates each action against what the agent is actually trying to accomplish. Both approaches reveal the same constraint. The human-in-the-loop was always a temporary architecture. At 5 approvals per session, a human adds genuine safety. At 40 approvals per session, a human adds latency and rubber stamps. The math on human attention doesn't scale with agent capability. This is where all of agentic coding lands within 12 months. The agent does the work. A second system reviews the work. The human reviews the output. Three layers. And the one shrinking fastest is the human layer in the middle.

OpenAI Developers@OpenAIDevs

Auto-review is a new mode that lets Codex work longer with fewer approvals and safer execution. It helps Codex keep moving through tests, builds, and more, including during long tasks and automations, while a separate agent checks higher-risk steps in context before they run.

English

108

13.7K

⌀ phantom.ctx أُعيد تغريده

spicylemonade@spicey_lemonade·5h

GPT 5.5 pro might be the first model to legitimately resolve IMO P6. details: GPT 5.5 pro and Claude 4.7 opus are the only models capable of resolving IMO P6. Given Claude 4.7 was able to resolve it, I was more inclined to believe this is due to contamination. However, the model says its knowledge cutoff is August 2025 while Claude's is Jan 2026. IMO 2025 ended at the end of July, so with an august knowledge cutoff of early August, the answers likely did not make it in. Furthermore, I asked 5.5 about the biggest news headlines of August 2025 and it didn't seem to remember anything early august, so the cutoff may be right at the end of July. An Argument for GPT 5.5 pro is that it does not mention the IMO problem anywhere in its thinking. Opus 4.7 mentions the exact problem: @j_dekoninck @SebastienBubeck

English

130

8.8K

⌀ phantom.ctx أُعيد تغريده

Sebastian Aaltonen@SebAaltonen·2h

Pro tip: Don't buy extra tokens for your $200/month Codex 5.5 plan. Have a personal $20/month plan. Switch to it during the 2 hour cooldown periods. I am using latest xhigh model, and never bought tokens and never waited. And I do big refactorings regularly.

English

119

7.6K

⌀ phantom.ctx أُعيد تغريده

Antoine v.d. SwiftLee @twannl·2h

Learn about App Intent Driven Development avanderlee.com/swift/app-inte… ♽ Reusable and well-architectured code 🔮 Prepare for the future ✨ Use intents inside your primary app #swiftlang #iosdev

English

602

⌀ phantom.ctx أُعيد تغريده

Simon Smith@_simonsmith·21h

Codex can't add to or edit Figma files the way Claude can, I think I've discovered why, and it's a fixable bug. The issue is that while Codex and Claude can read Figma files via the Figma connector, Codex can't add to or edit Figma files. This is strange, because Claude can do it, and does it well. I've traced the issue to a mismatch between actions the Figma plugin DOCUMENTS and actions the ChatGPT Figma connector (at least in Enterprise) EXPOSES. The plugin tells Codex to use use_figma, create_new_file, and search_design_system. These actions are not available in the ChatGPT Figma connector to approve! They are available in the Claude Figma connector. I'm not sure if the ChatGPT/Codex team or Figma need to address this, but right now it's a glaring gap for me between Codex and Claude that inhibits workflows that include Figma.

English

2.4K

⌀ phantom.ctx أُعيد تغريده

Tom Turney@no_stp_on_snek·19h

Native Swift/Metal backend for vLLM on Apple Silicon. No Python in the inference hot path → better throughput + scaling. Try it: brew tap TheTom/tap && brew install vllm-swift Looking for beta testers → github.com/TheTom/vllm-sw…

English

417

27.9K

⌀ phantom.ctx أُعيد تغريده

nico@nicochristie·11h

We have been testing GPT 5.5 on the hardest spreadsheet tasks in the world (100k-1M+ cell complex models). It is the Pareto frontier for spreadsheets -- SOTA accuracy, the fastest and the most efficient public model across effort levels. OAI really cooked here

English

773

51.2K

⌀ phantom.ctx أُعيد تغريده

Sherwin Wu@sherwinwu·13h

Set Codex to this and never look back. Medium reasoning effort is good enough for me for ~anything I need to do now.

English

1.3K

84.1K

⌀ phantom.ctx أُعيد تغريده

signüll@signulll·16h

openai's product execution & velocity has stepped up noticeably, & the tone feels more human again. it felt corporate for a while in the middle. w/ the recent releases incl 5.5 you can feel the real focus & polish showing through again. credit where it's due cuz the work on agents & codex is game changing stuff for the broader economy. comms feels tighter & again much more relatable. something changed. they clearly took the feedback to heart. begs the question, is openai pivoting away from consumer stuff? or at least it's p1 instead of p0 now. that would be a big shift.

Sam Altman@sama

These are cool! I think most companies will want to use them.

English

744

62K

⌀ phantom.ctx أُعيد تغريده

Mohammad Azam@azamsharp·14h

Stop Fighting SwiftUI Sheets. Use This Pattern Instead azamsharp.com/2024/08/18/glo… #iosdev #swiftui

English

742

⌀ phantom.ctx أُعيد تغريده

jason liu@jxnlco·14h

As models get smarter, contradictions in your codebase and prompts are becoming more expensive The more contradictions there are, the more the model needs a reason to identify under what circumstances the rules you lay out make sense. It's kind of like when I talk to my girlfriend. You said this, but you also said that, so I don't know what to do in this situation ..

English

124

6.4K

⌀ phantom.ctx أُعيد تغريده

NVIDIA@nvidia·15h

Efficiency isn't just about speed anymore — it's about the massive reduction in the cost of intelligence. NVIDIA and @OpenAI's partnership leverages the GB200 NVL72 to deliver a 35x reduction in token costs, bringing enterprise-grade AI to an unprecedented scale. Trained and served on NVIDIA GB200 NVL72 systems, GPT-5.5 delivers the sustained performance required for execution-heavy, multi-step work — and at NVIDIA, that means teams are now scaling human ingenuity with OpenAI Codex Agents.

English

117

1.2K

50.5K

⌀ phantom.ctx أُعيد تغريده

Claude@claudeai·14h

Memory on Claude Managed Agents is now in public beta. Your agents can now learn from every session, using an intelligence-optimized memory layer that balances performance with flexibility.

English

250

478

7.1K

347.4K

⌀ phantom.ctx أُعيد تغريده

Jonas@JonasBadalic·15h

Pleased to announce that we've made @sentry slightly denser, and cleaned up the layout.

English

3.8K

⌀ phantom.ctx أُعيد تغريده

Max Weinbach@mweinbach·15h

This feels like the fastest new model rollout from OpenAI

English

159

6.5K

⌀ phantom.ctx أُعيد تغريده

Parallel Web Systems@p0·16h

The best web search for agents is now free. Upgrade to Parallel's web search tools in any MCP-supported tool or agent, for free, in under 60 seconds. No account. No API keys. Zero cost. docs.parallel.ai/integrations/m…

GIF

English

246

119.4K

⌀ phantom.ctx أُعيد تغريده

Lenny Rachitsky@lennysan·15h

Claude Code's Head of Product: "The hardest PM skill right now is how to be the right amount of AGI-pilled."

Lenny Rachitsky@lennysan

How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg

English

737

141.8K

⌀ phantom.ctx أُعيد تغريده

Dan McAteer@daniel_mac8·15h

GPT-5.5 beats Opus 4.7 on several benchmarks, esp those related to agentic coding + tool calling. It's also pretty damn close to Claude Mythos... Even beats Mythos on Terminal-Bench 2.0. However, GPT-5.5 is far more token efficient than Opus 4.7. OpenAI cooked this Spud 🥔.

English

173

7.8K

⌀ phantom.ctx@phantomctx·15h

@laurasideral sleeek, wow, amazing

English

115

⌀ phantom.ctx أُعيد تغريده

Laura Sandoval@laurasideral·16h

Introducing a new way to manage your Notion agents in the Notion AI beta 💬 We want to make it easier for you to track your agents’ activity and course-correct when needed—so we’re bringing it to the forefront. Let us know what you think! Join the beta: testflight.apple.com/join/m2kxP5cw

English

294

21.5K

اكتشف

@j_dekoninck @SebastienBubeck @OpenAI @sentry @laurasideral @elonmusk @BarackObama @taylorswift13