FateOfMuffins
147 posts


@AcerFur What if you /goal GPT 5.5 xHigh in codex with subagents to solve some math problems and compare with Pro?
English

@EMostaque @OpenAI I know it's not the same but have codex use GPT Pro in the browser xd
English

@ashrealite @AcerFur I doubt it'll be a *giant* difference but pretty sure some of the puzzles that ARC says the AI fails at... GPT 5.5 probably could do a couple of the easier ones in codex computer use and it'll be "efficient". It would still probably fail at a lot of the harder ones
English

@ashrealite @AcerFur You can read it and judge if I gave the model too much "help". I told it to write down its reasoning because the thinking traces don't get passed through in ChatGPT. chatgpt.com/share/69c4c4c8…
English

@ashrealite @AcerFur I know it's just a "trust me bro" but I only had generic custom instructions like avoiding emdashes
The reasoning it wrote down at least makes me think it understood the puzzle and was not just entirely clicking at random
It solved it in 24 steps
English

@andonlabs The slope increased dramatically for 5.5 near the end. What happened?
What happens if you let it run for 2 years instead of 1?
English

We got early access to GPT-5.5. It's 3rd on Vending-Bench 2: better than GPT-5.4 but worse than Opus 4.7.
However, it's on par with Opus 4.6 without any of the deception or power-seeking we saw from Opus 4.6 and Mythos. So bad behavior isn't necessary. Why is Claude doing it?

OpenAI@OpenAI
Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.
English


@GregHBurnham @nikhilchandak29 Although at the end of the day, if you tossed out 90% of the proposed problems, the final mock exams look decent
I can DM you the problem bank and you can take a quick look through some of the harder problems to see if it made any interesting ones
English

@GregHBurnham @nikhilchandak29 Ironically my students complained that my mocks were WAY harder than the actual contest lol
English

@GregHBurnham @nikhilchandak29 I have a project with 650 problems I generated in March for a contest last month, which was then narrowed down through automated quality checks from agents repeatedly down to about 40 or so to actually construct 2 mock exams.
English

@TheRealAdamG Codex will only be proto-AGI to me if it can port all the features that Codex has on Mac to Windows in the same release day
English

Codex, and what Codex becomes in the near term, to me is a proto-AGI.
We’re on the flight path, we’ve been descending down through the clouds for a while and we can now see visible landmarks — some familiar and some we’ve never seen (or never seen from this POV). What a time to be alive.
(This is not a formal OpenAI POV, rather just the musings of some random head of broccoli)
English

@AcerFur Correct me if I'm wrong but over the last decade, many of the researchers at OpenAI were scouted in the middle of their PhDs and they left to pursue AI research at OpenAI. Did they advance the field further by staying in academia or at the frontier labs?
English

@AcerFur How long would it take to finish a post grad? Where will math be at that point in time? Where can you make the biggest impact in math research?
Is it the traditional way where you're not getting a PhD for another 5 years or is it spearheading the AI research for math at OpenAI?
English





