Adi Ganesh

137 posts

Adi Ganesh

@_adiganesh

Research @openai. Prev. @metaai @nuro @stanford @thielfellowship. Co-created @gradientpub

San Francisco, CA Katılım Temmuz 2023

1.3K Takip Edilen1.2K Takipçiler

Sabitlenmiş Tweet

Adi Ganesh@_adiganesh·7 Ağu

Incredibly excited to release GPT-5 to the world! It’s a great coding model, and particularly good at long agentic tasks and frontend coding. It’s the smartest model we’ve ever shipped, and I’m eager to see how people use it. Please try it and share what you think :)

Sam Altman@sama

GPT-5 can do very complex software engineering tasks in practice, well beyond vibe coding

English

8.9K

Adi Ganesh retweetledi

Mehtaab Sawhney@mehtaab_sawhney·1d

We are excited to share a new paper solving three further problems due to Erdős; in each case the solution was found by an internal model at OpenAI. Each proof is short and elegant, and the paper is available here: arxiv.org/pdf/2603.29961

English

146

284.2K

Adi Ganesh@_adiganesh·14 Mar

Great article from @thekaransinghal on how to rigorously evaluate AI systems in medical contexts:

Karan Singhal@thekaransinghal

x.com/i/article/2032…

English

253

Adi Ganesh@_adiganesh·13 Mar

@JasonBud @milichab congrats!

English

241

Jason Ginsberg@JasonBud·12 Mar

I’m proud to be joining SpaceX and xAI with @milichab It has become clear that software is changing fundamentally. More and more, people can shape the tools they use directly, and the ceiling of what can be built keeps rising. What makes xAI special is the scale of its ambition: to build from first principles all the way out to the stars. I’m especially grateful to work on products that expand human agency and freedom. That mission is deeply personal to me. My family came to the United States fleeing communism, and the belief that freedom should be part of the next generation of the internet has driven me every day since Andrew and I started Skiff. Now, we get to work on intelligence, understanding, and freedom on a universal scale.

English

478

492

8.6K

47.9M

Adi Ganesh@_adiganesh·13 Mar

@milichab @SpaceX @xai @JasonBud congrats Andrew!

English

688

Andrew Milich@milichab·12 Mar

I’m joining @SpaceX and @xai with @JasonBud. X is the company realizing science fiction - reusable rockets, humanoid robots, data centers in space, and more. Almost 10 years ago, I joined SpaceX as an intern on Dragon 2 crew displays. This was in the era of the first rocket landings on barges, long before the Dragon 2 restored human spaceflight to America or Starlink delivered internet from space. Every day since then, I’ve thought about the next steps to land on the Moon - and to build a city on Mars, data centers in space, the brains behind robots, and beyond. There is no better place to build teams and products from the ground up with planetary scale resources. If you’re looking to work on the hardest problems that lay a foundation for humanity’s future to the Moon, Mars, and beyond - DM me.

English

853

798

8.9K

9.5M

Adi Ganesh retweetledi

Hanson Wang@hansonwng·6 Mar

How are GPT-5.4 and GPT-5.3-Codex so good at Terminal-Bench? Check out 5.4's solution to one of the hardest previously unsolved tasks (gpt2-codegolf) to see a particularly cool example!

Hanson Wang@hansonwng

x.com/i/article/2029…

English

4.4K

Adi Ganesh retweetledi

Yann Dubois@yanndubs·6 Mar

🔥Two things I'm esp excited about 5.4: 1. Unification: we merged our codex & mainline models 2. Efficiency: we brought the efficiency of 5.3-codex to CUA & knowledge work. We only showed 3 such plots in the blog but many of our evals required less time (tokens/tools) than 5.2. What should we fix for the next model?

English

561

44.8K

Adi Ganesh retweetledi

Hanson Wang@hansonwng·6 Mar

GPT-5.4 is here - with this release, the Codex and Thinking models are officially unified! 5.4 is even better at coding than 5.3-Codex and a huge step up from 5.2 in computer use and knowledge work. A big feel-the-AGI moment for me personally: leading up to the launch, I asked 5.4-xhigh in Codex to autonomously iterate on variations of Codex’s own system prompt. It ran for over 17 hours and ran 200+ evals, coming up with intelligent strategies like writing its own scripts to monitor eval progress and checking partial progress to prune less-than-promising branches. Throw your hardest tasks at it and see what happens!

OpenAI@OpenAI

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English

144

7.5K

Adi Ganesh@_adiganesh·6 Mar

@SQMah was a blast working together on this model @SQMah!

English

Adi Ganesh retweetledi

SQ Mah@SQMah·6 Mar

Just demoed some of 5.4’s computer use and frontend capabilities - check it out here! What I really like is that computer use was on an Electron app, so Codex can also make and test desktop apps as well Also yes I need a haircut :)

OpenAI Developers@OpenAIDevs

GPT-5.4 is here. Native computer-use capabilities. Up to 1M tokens of context in Codex and the API. Best-in-class agentic coding for complex tasks. Scalable tool search across larger ecosystems. More efficient reasoning for long, tool-heavy workflows. openai.com/index/introduc…

English

2.6K

Larry Lv@larrylv·5 Mar

GPT-5.4 is here! It was fun training this one with so many wonderful colleagues.

OpenAI@OpenAI

English

1.6K

Adi Ganesh@_adiganesh·5 Mar

@larrylv was a pleasure training this one with you!

English

Adi Ganesh@_adiganesh·5 Mar

Excited to ship GPT-5.4! This is the first model I trust to run training jobs and evals for me. I can give it access to a GPU cluster and it launches my jobs, monitors them for errors, and debugs on my behalf. 5.4 is excellent at long horizon agentic coding, computer use, and knowledge work. This model combines the best of 5.3-codex and 5.2, and is launching in Codex, ChatGPT, and the API. Please try it out and share what you think :)

OpenAI@OpenAI

English

650

Adi Ganesh@_adiganesh·27 Şub

@mitchellh Glad you like the model @mitchellh!

English

Mitchell Hashimoto@mitchellh·25 Şub

I know this is pretty well established at this point, but Codex 5.3 is a much more effective model than Opus 4.6. I went back and forth on both for a bit, but haven’t touched Opus at all now for a full week. First model to get me off of Opus… ever. Good job Codex team.

English

336

220

5.3K

1.1M

Adi Ganesh@_adiganesh·18 Şub

@realchillben new workout idea: codex HIIT where you kick off new tasks in between sets

English

Bill Chen@realchillben·17 Şub

kicking off codex jobs before i go for a run

English

10.9K

Adi Ganesh retweetledi

Jakub Pachocki@merettm·14 Şub

Solution attempts from our model: cdn.openai.com/pdf/a430f16e-0…

English

447

102.6K

Adi Ganesh retweetledi

Jakub Pachocki@merettm·14 Şub

Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the ten proposed problems. The problems require expertise in their respective domains and are not easy to verify; based on feedback from experts, we believe at least six solutions (2, 4, 5, 6, 9, 10) have a high chance of being correct, and some further ones look promising. We will only publish the solution attempts after midnight (PT), per the authors' guidance - the sha256 hash of the PDF is d74f090af16fc8a19debf4c1fec11c0975be7d612bd5ae43c24ca939cd272b1a . This was a side-sprint executed in a week mostly by querying one of the models we're currently training; as such, the methodology we employed leaves a lot to be desired. We didn't provide proof ideas or mathematical suggestions to the model during this evaluation; for some solutions, we asked the model to expand upon some proofs, per expert feedback. We also manually facilitated a back-and-forth between this model and ChatGPT for verification, formatting and style. For some problems, we present the best of a few attempts according to human judgement. We are looking forward to more controlled evaluations in the next round! 1stproof.org #1stProof

English

245

354

2.8K

2.5M

Adi Ganesh retweetledi

Hanson Wang@hansonwng·12 Şub

✨1000 tokens per second!

OpenAI Developers@OpenAIDevs

Introducing GPT-5.3-Codex-Spark, our ultra-fast model purpose built for real-time coding. We’re rolling it out as a research preview for ChatGPT Pro users in the Codex app, Codex CLI, and IDE extension.

English

1.3K

Adi Ganesh@_adiganesh·10 Şub

@rauchg we’ve been working hard on making the model better at frontend - glad to see this reflected in your evals!

English

2.1K

Guillermo Rauch@rauchg·10 Şub

🆕 GPT 5.3 Codex (xhigh) achieves 90% on Next.js evals out of the box, "frame-mogging" the competition so to speak: nextjs.org/evals

English

1.6K

211.1K

Adi Ganesh retweetledi

Jerry Tworek@MillionInt·7 Şub

Run fewer experiments and think about them more

English

562

50.5K

Adi Ganesh@_adiganesh·7 Şub

It’s kind of astonishing how I rarely need to manually type in code anymore with 5.3-codex, and how quickly this happened. Instead I’m mostly just managing N parallel checkouts of the code (N=5, for now) and delegating subtasks to the model. This feels like when Waymo quietly solved self-driving in SF and the world rapidly adapted to a new version of reality.

Josh McGrath@j_mcgraph

The timeline is crazy it’s all us nerds getting one shorted by some incredible models

English

908

Keşfet

@thekaransinghal @JasonBud @milichab @SpaceX @xai @SQMah @larrylv @mitchellh