Adi Ganesh

137 posts

Adi Ganesh

Adi Ganesh

@_adiganesh

Research @openai. Prev. @metaai @nuro @stanford @thielfellowship. Co-created @gradientpub

San Francisco, CA Katılım Temmuz 2023
1.3K Takip Edilen1.2K Takipçiler
Adi Ganesh retweetledi
Mehtaab Sawhney
Mehtaab Sawhney@mehtaab_sawhney·
We are excited to share a new paper solving three further problems due to Erdős; in each case the solution was found by an internal model at OpenAI. Each proof is short and elegant, and the paper is available here: arxiv.org/pdf/2603.29961
English
25
146
1K
284.2K
Jason Ginsberg
Jason Ginsberg@JasonBud·
I’m proud to be joining SpaceX and xAI with @milichab It has become clear that software is changing fundamentally. More and more, people can shape the tools they use directly, and the ceiling of what can be built keeps rising. What makes xAI special is the scale of its ambition: to build from first principles all the way out to the stars. I’m especially grateful to work on products that expand human agency and freedom. That mission is deeply personal to me. My family came to the United States fleeing communism, and the belief that freedom should be part of the next generation of the internet has driven me every day since Andrew and I started Skiff. Now, we get to work on intelligence, understanding, and freedom on a universal scale.
English
478
492
8.6K
47.9M
Andrew Milich
Andrew Milich@milichab·
I’m joining @SpaceX and @xai with @JasonBud. X is the company realizing science fiction - reusable rockets, humanoid robots, data centers in space, and more. Almost 10 years ago, I joined SpaceX as an intern on Dragon 2 crew displays. This was in the era of the first rocket landings on barges, long before the Dragon 2 restored human spaceflight to America or Starlink delivered internet from space. Every day since then, I’ve thought about the next steps to land on the Moon - and to build a city on Mars, data centers in space, the brains behind robots, and beyond. There is no better place to build teams and products from the ground up with planetary scale resources. If you’re looking to work on the hardest problems that lay a foundation for humanity’s future to the Moon, Mars, and beyond - DM me.
Andrew Milich tweet media
English
853
798
8.9K
9.5M
Adi Ganesh retweetledi
Hanson Wang
Hanson Wang@hansonwng·
How are GPT-5.4 and GPT-5.3-Codex so good at Terminal-Bench? Check out 5.4's solution to one of the hardest previously unsolved tasks (gpt2-codegolf) to see a particularly cool example!
Hanson Wang@hansonwng

x.com/i/article/2029…

English
1
5
41
4.4K
Adi Ganesh retweetledi
Yann Dubois
Yann Dubois@yanndubs·
🔥Two things I'm esp excited about 5.4: 1. Unification: we merged our codex & mainline models 2. Efficiency: we brought the efficiency of 5.3-codex to CUA & knowledge work. We only showed 3 such plots in the blog but many of our evals required less time (tokens/tools) than 5.2. What should we fix for the next model?
Yann Dubois tweet mediaYann Dubois tweet mediaYann Dubois tweet media
English
51
29
561
44.8K
Adi Ganesh retweetledi
Hanson Wang
Hanson Wang@hansonwng·
GPT-5.4 is here - with this release, the Codex and Thinking models are officially unified! 5.4 is even better at coding than 5.3-Codex and a huge step up from 5.2 in computer use and knowledge work. A big feel-the-AGI moment for me personally: leading up to the launch, I asked 5.4-xhigh in Codex to autonomously iterate on variations of Codex’s own system prompt. It ran for over 17 hours and ran 200+ evals, coming up with intelligent strategies like writing its own scripts to monitor eval progress and checking partial progress to prune less-than-promising branches. Throw your hardest tasks at it and see what happens!
OpenAI@OpenAI

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English
4
5
144
7.5K
Adi Ganesh retweetledi
SQ Mah
SQ Mah@SQMah·
Just demoed some of 5.4’s computer use and frontend capabilities - check it out here! What I really like is that computer use was on an Electron app, so Codex can also make and test desktop apps as well Also yes I need a haircut :)
OpenAI Developers@OpenAIDevs

GPT-5.4 is here. Native computer-use capabilities. Up to 1M tokens of context in Codex and the API. Best-in-class agentic coding for complex tasks. Scalable tool search across larger ecosystems. More efficient reasoning for long, tool-heavy workflows. openai.com/index/introduc…

English
6
1
32
2.6K
Adi Ganesh
Adi Ganesh@_adiganesh·
@larrylv was a pleasure training this one with you!
English
1
0
1
54
Adi Ganesh
Adi Ganesh@_adiganesh·
Excited to ship GPT-5.4! This is the first model I trust to run training jobs and evals for me. I can give it access to a GPU cluster and it launches my jobs, monitors them for errors, and debugs on my behalf. 5.4 is excellent at long horizon agentic coding, computer use, and knowledge work. This model combines the best of 5.3-codex and 5.2, and is launching in Codex, ChatGPT, and the API. Please try it out and share what you think :)
OpenAI@OpenAI

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English
0
0
22
650
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
I know this is pretty well established at this point, but Codex 5.3 is a much more effective model than Opus 4.6. I went back and forth on both for a bit, but haven’t touched Opus at all now for a full week. First model to get me off of Opus… ever. Good job Codex team.
English
336
220
5.3K
1.1M
Adi Ganesh
Adi Ganesh@_adiganesh·
@realchillben new workout idea: codex HIIT where you kick off new tasks in between sets
English
0
0
1
69
Bill Chen
Bill Chen@realchillben·
kicking off codex jobs before i go for a run
English
6
0
38
10.9K
Adi Ganesh retweetledi
Jakub Pachocki
Jakub Pachocki@merettm·
Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the ten proposed problems. The problems require expertise in their respective domains and are not easy to verify; based on feedback from experts, we believe at least six solutions (2, 4, 5, 6, 9, 10) have a high chance of being correct, and some further ones look promising. We will only publish the solution attempts after midnight (PT), per the authors' guidance - the sha256 hash of the PDF is d74f090af16fc8a19debf4c1fec11c0975be7d612bd5ae43c24ca939cd272b1a . This was a side-sprint executed in a week mostly by querying one of the models we're currently training; as such, the methodology we employed leaves a lot to be desired. We didn't provide proof ideas or mathematical suggestions to the model during this evaluation; for some solutions, we asked the model to expand upon some proofs, per expert feedback. We also manually facilitated a back-and-forth between this model and ChatGPT for verification, formatting and style. For some problems, we present the best of a few attempts according to human judgement. We are looking forward to more controlled evaluations in the next round! 1stproof.org #1stProof
English
245
354
2.8K
2.5M
Adi Ganesh
Adi Ganesh@_adiganesh·
@rauchg we’ve been working hard on making the model better at frontend - glad to see this reflected in your evals!
English
2
0
15
2.1K
Guillermo Rauch
Guillermo Rauch@rauchg·
🆕 GPT 5.3 Codex (xhigh) achieves 90% on Next.js evals out of the box, "frame-mogging" the competition so to speak: nextjs.org/evals
Guillermo Rauch tweet media
English
88
90
1.6K
211.1K
Adi Ganesh retweetledi
Jerry Tworek
Jerry Tworek@MillionInt·
Run fewer experiments and think about them more
English
19
30
562
50.5K
Adi Ganesh
Adi Ganesh@_adiganesh·
It’s kind of astonishing how I rarely need to manually type in code anymore with 5.3-codex, and how quickly this happened. Instead I’m mostly just managing N parallel checkouts of the code (N=5, for now) and delegating subtasks to the model. This feels like when Waymo quietly solved self-driving in SF and the world rapidly adapted to a new version of reality.
Josh McGrath@j_mcgraph

The timeline is crazy it’s all us nerds getting one shorted by some incredible models

English
0
0
11
908