Casey Chu

265 posts

Casey Chu

@caseychu9

Researcher at @openai

San Francisco, CA Katılım Ağustos 2017

707 Takip Edilen4.1K Takipçiler

Sabitlenmiş Tweet

Casey Chu@caseychu9·17 Tem

We launched ChatGPT Agent today! When tested on a variety of REAL work tasks (expert tasks that might take >10h), we found that its output was human-quality almost 50% of the time Agent puts o3's intelligence into practice - try your work tasks and let us know how it goes!

OpenAI@OpenAI

ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.

English

138

15.2K

Casey Chu@caseychu9·6 Mar

@nickcammarata It's here! x.com/OpenAI/status/…

OpenAI@OpenAI

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English

329

Nick@nickcammarata·6 Mar

the progression: no paper → no weights → benchmarks that don’t compare to other company’s models. next up: just a photo of the team looking confident and smiling

English

353

17.7K

Casey Chu@caseychu9·24 Kas

in time for the holidays!

OpenAI@OpenAI

Introducing shopping research, a new experience in ChatGPT that does the research to help you find the right products. It’s everything you like about deep research but with an interactive interface to help you make smarter purchasing decisions.

English

688

Casey Chu retweetledi

Zeke Darwin@Zeke_Darwin·20 Eki

Last year I posted a video about a ~200,000 year old denisovan genome. Today, we got a pre-print… and some beautiful figures. It’s seeming more and more likely that our divergences have been underestimated due to the hybridization events that occurred after.

English

294

51.7K

Casey Chu retweetledi

Will Ellsworth@will_ellsworth_·21 Eki

It’s time for vibe-lifeing

OpenAI@OpenAI

Meet our new browser—ChatGPT Atlas. Available today on macOS: chatgpt.com/atlas

English

30.3K

Casey Chu@caseychu9·8 Eyl

@BarrAlexandra They aren't harder - there was just no demand for them before!

English

212

Alexandra Barr@BarrAlexandra·6 Eyl

why are rl envs so much harder to build than simulated game envs // or why are we just building them now when we solved temple run like 5 yrs ago

English

2.1K

Casey Chu@caseychu9·29 Ağu

@sainingxie I did that question too! That, and a question about DDPG from @lilianweng, made me so excited to work at OpenAI :)

English

3.1K

Saining Xie@sainingxie·29 Ağu

good question... thinking back to pre-LLM interviews I experienced (before 2019)… they were all in-person on-site, no chance of ''llm cheating,'' very different across places, and somehow way more memorable. > old deepmind had brutal ''quizzes'' -- 2-hour marathons with 100+ math/stats/ML concept questions. > meta FAIR was basically academia interview with a bit of coding, but the highlight was chatting vision research with piotr, ross and kaiming. > google brain/research was similar. the @NoamShazeer was my coding interviewer, who kindly kept it simple with just a two-pointer q. we spent most of the time discussing research, where I explained how I had applied something called a transformer to visual data (point clouds) -- a topic that, at the time, hardly anyone cared about. > but the coolest? openai in 2018: whiteboard coding, a research talk, and a *~5-hour* session in a tiny room to work on an RL problem (variance collapse in cross entropy methods). I knew almost nothing about RL, but that was the point. They handed you a self-contained problem description, handwritten by @johnschulman2, and expected you to learn, research, solve, write up in a notebook, and present. feeling a bit nostalgic. makes me wonder if interviews like that still happen anywhere. If they do, I’d love to know. :)

Lucas Beyer (bl16)@giffmana

At which of these places did you have the coolest interview in your career? I know it's an ill-posed poll, but what am i gonna do with only 4 options?! I tried grouping them by interview similarity to the best of my knowledge. Comment if "other". Might make a second round.

English

120

2.4K

299.1K

Casey Chu@caseychu9·8 Ağu

@idomyowntricks Thanks - should be fixed

English

Brian Christner@idomyowntricks·7 Ağu

@caseychu9 just as info

English

Brian Christner@idomyowntricks·7 Ağu

Seems @OpenAI Agents don't work yet with the GPT-5 model. Also, it's not possible to select previous models.

English

170

Casey Chu@caseychu9·3 Ağu

@__paleologo Very satisfying! ChatGPT gave this other one-liner with a different flavor

English

1.3K

Gappy (Giuseppe Paleologo)@__paleologo·2 Ağu

I did not know that, for any random variable x, | mean(x) - median(x) | <= stdev(x) Direct proof:

English

201

2.8K

243K

Casey Chu@caseychu9·26 Tem

@alexandr_wang @shengjia_zhao Congrats @shengjia_zhao!

English

461

Alexandr Wang@alexandr_wang·25 Tem

We are excited to announce that @shengjia_zhao will be the Chief Scientist of Meta Superintelligence Labs! Shengjia is a brilliant scientist who most recently pioneered a new scaling paradigm in his research. He will lead our scientific direction for our team. Let's go 🚀

English

338

462

3.6M

Casey Chu retweetledi

Jerry Tworek@MillionInt·19 Tem

To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system

English

112

1.3K

172.6K

Casey Chu@caseychu9·18 Tem

@yongyuanxi date picking is the final boss

English

189

Towaki Takikawa / 瀧川永遠希@yongyuanxi·18 Tem

OpenAI agent mode- not very fun to doomwatch since it makes a lot of small obvious mistakes (reminds me of teaching people who don't use computers how to use the file navigator). It gave up using Google Flights after a bit and is now on Kayak

English

1.8K

Casey Chu@caseychu9·18 Tem

@SarahChieng omg guilty 😭

English

124

Sarah Chieng@MilksandMatcha·18 Tem

@caseychu9 pov why casey chu went dark pt 2

English

154

Casey Chu@caseychu9·17 Tem

OpenAI@OpenAI

English

138

15.2K

Casey Chu@caseychu9·17 Tem

@0xTejpal @METR_Evals I'm a big fan of @METR_Evals! They are looking at software engineering tasks graded on objective correctness, while our eval is broader than that (the output is usually docs, sheets, slides), graded relative to a reference human output

English

237

Tejpal Singh@0xTejpal·17 Tem

@caseychu9 >10hr tasks at 50% reliability seems like a very different picture from what @METR_Evals has shown. Could you elaborate on this discrepancy?

English

248

Casey Chu retweetledi

Sam Altman@sama·17 Tem

watching chatgpt agent use a computer to do complex tasks has been a real "feel the agi" moment for me; something about seeing the computer think, plan, and execute hits different.

English

1.1K

792

12.7K

4.2M

Casey Chu@caseychu9·17 Tem

Huge thanks to @gracejkim9 Elizabeth Proehl @michelelwang @marwan_aljubeh @rachelds__ @tejalpatwardhan for putting this eval together, letting us measure Agent's capabilities in realistic work settings 💪

English

731

Casey Chu@caseychu9·17 Tem

working on bringing that pass@16 number down to pass@1 💪

Epoch AI@EpochAIResearch

@OpenAI We also found that, when allowed 16 tries per problem, ChatGPT agent’s score grew from 27% to 49% on the tier 1-3 set. This suggests that better prompting or scaffolding might result in better performance from current models.

English

2.8K

Casey Chu@caseychu9·17 Tem

@reiinakano and @shgusdngogo pioneered building computer-using agents - so exciting to finally see their work in ChatGPT!

Reiichiro Nakano@reiinakano

huge congrats to @isafulf @caseychu9 @EdwardSun0909 Yash Kumar, and the entire Agent team for this huge achievement!! Was a witness to their early efforts and sleepless nights and hard work, it's truly amazing to see it all come together today 🥹

English

354

Casey Chu@caseychu9·17 Tem

Great post from @xikun_zhang_, who did a great job making sure collaboration with Agent feels good!

Xikun Zhang 张熙堃@xikun_zhang_

Just launched ChatGPT Agent (sorry GPT-5 waiters, it is coming!), the most capable AI agent model to date! It has been such an honor to be part of a crazy sprint to get this amazing model trained and shipped together with an absolutely gem team (@isafulf , @caseychu9 , @EdwardSun0909 , @josh_tobin_ Yash Kumar and many more)! I am so proud of this project, so I want to share some highlights, personal takes and lessons learned while working on it: 1. Used for research 📕 + actions 💻 + slides generation: Deep Research can do research. Operator can take actions for you. ChatGPT Agent can do both at the same time! E.g. you can ask it to make a plan for a trip to Hawaii, find good deals on hotels and flights, and book them on your behalf using its own computer! It can also generate slides! 2. Power of end-to-end RL: How do we build it? You guess it right! It is us, @OpenAI RL diehards. You are probably tired of hearing about RL scaling. Me, too. But when I feel its power first-hand, its effectiveness and data efficiency still shock me and feel like magic 🪄. 3. First OpenAI model of high biorisk 💀: Not sure this is something I should proud of or not :) For an ex-AI bio PhD researcher like me, this is something a bit personal. One one hand, many of my biomedicine researcher friends tell me that AI agents have significantly helped with their research. On the other hand, such a capable model can amplify the risk of malicious actors building bioweapons. Our safety team has done incredible work to mitigate the risks. 4. Collaboration with users 👪 is core: We want our AI to augment and enhance humans, not to replace them, so we work hard to make the model good at collaborating with the user. You can type a message at anytime to interrupt it and steer it to new directions. The model will always confirm with you before taking actions like buying things for you or deleting a file on your google drive. And the model will ask clarification questions only when it needs more clarity from you! 5. How to generate good slides: As in other cases, writing a well-specified prompt always helps! Also try first telling it to generate a report, then convert the report into slides! 6. Real-world performance > benchmark chasing: One thing outside people may not know about us is how little attention we pay to external benchmarks during the model dev process. We do not focus on hill-climbing on them, and we do not care that much about how we end up on the leaderboard. That said, as a byproduct of our pursuit to great real-world performance and true intelligence, ChatGPT Agent does crush many benchmarks! Wanna learn more? Read our blog linked in the end! In the end, I want to shout out to my amazing team again. These extremely talented and kind people are the reason why OpenAI is constantly making magic like this! ❤️ Also please try ChatGPT Agent and give us feedback! You can reply here in the thread or my DM is open. This is just the start. We will continue working hard towards more and more capable super-human AI agents! 🤖 openai.com/index/introduc…

English

982

Casey Chu retweetledi

OpenAI@OpenAI·17 Tem

ZXX

641

705

7.6K

2.9M

Keşfet

@nickcammarata @BarrAlexandra @sainingxie @lilianweng @NoamShazeer @johnschulman2 @idomyowntricks @OpenAI