Casey Chu

265 posts

Casey Chu

Casey Chu

@caseychu9

Researcher at @openai

San Francisco, CA Katılım Ağustos 2017
707 Takip Edilen4.1K Takipçiler
Sabitlenmiş Tweet
Casey Chu
Casey Chu@caseychu9·
We launched ChatGPT Agent today! When tested on a variety of REAL work tasks (expert tasks that might take >10h), we found that its output was human-quality almost 50% of the time Agent puts o3's intelligence into practice - try your work tasks and let us know how it goes!
Casey Chu tweet media
OpenAI@OpenAI

ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.

English
14
7
138
15.2K
Nick
Nick@nickcammarata·
the progression: no paper → no weights → benchmarks that don’t compare to other company’s models. next up: just a photo of the team looking confident and smiling
Nick tweet media
English
15
8
353
17.7K
Casey Chu retweetledi
Zeke Darwin
Zeke Darwin@Zeke_Darwin·
Last year I posted a video about a ~200,000 year old denisovan genome. Today, we got a pre-print… and some beautiful figures. It’s seeming more and more likely that our divergences have been underestimated due to the hybridization events that occurred after.
Zeke Darwin tweet mediaZeke Darwin tweet mediaZeke Darwin tweet media
English
12
35
294
51.7K
Casey Chu
Casey Chu@caseychu9·
@BarrAlexandra They aren't harder - there was just no demand for them before!
English
1
0
1
212
Alexandra Barr
Alexandra Barr@BarrAlexandra·
why are rl envs so much harder to build than simulated game envs // or why are we just building them now when we solved temple run like 5 yrs ago
English
3
0
9
2.1K
Casey Chu
Casey Chu@caseychu9·
@sainingxie I did that question too! That, and a question about DDPG from @lilianweng, made me so excited to work at OpenAI :)
English
1
0
9
3.1K
Saining Xie
Saining Xie@sainingxie·
good question... thinking back to pre-LLM interviews I experienced (before 2019)… they were all in-person on-site, no chance of ''llm cheating,'' very different across places, and somehow way more memorable. > old deepmind had brutal ''quizzes'' -- 2-hour marathons with 100+ math/stats/ML concept questions. > meta FAIR was basically academia interview with a bit of coding, but the highlight was chatting vision research with piotr, ross and kaiming. > google brain/research was similar. the @NoamShazeer was my coding interviewer, who kindly kept it simple with just a two-pointer q. we spent most of the time discussing research, where I explained how I had applied something called a transformer to visual data (point clouds) -- a topic that, at the time, hardly anyone cared about. > but the coolest? openai in 2018: whiteboard coding, a research talk, and a *~5-hour* session in a tiny room to work on an RL problem (variance collapse in cross entropy methods). I knew almost nothing about RL, but that was the point. They handed you a self-contained problem description, handwritten by @johnschulman2, and expected you to learn, research, solve, write up in a notebook, and present. feeling a bit nostalgic. makes me wonder if interviews like that still happen anywhere. If they do, I’d love to know. :)
Saining Xie tweet mediaSaining Xie tweet media
Lucas Beyer (bl16)@giffmana

At which of these places did you have the coolest interview in your career? I know it's an ill-posed poll, but what am i gonna do with only 4 options?! I tried grouping them by interview similarity to the best of my knowledge. Comment if "other". Might make a second round.

English
22
120
2.4K
299.1K
Brian Christner
Brian Christner@idomyowntricks·
Seems @OpenAI Agents don't work yet with the GPT-5 model. Also, it's not possible to select previous models.
Brian Christner tweet media
English
2
0
0
170
Casey Chu
Casey Chu@caseychu9·
@__paleologo Very satisfying! ChatGPT gave this other one-liner with a different flavor
Casey Chu tweet media
English
1
0
8
1.3K
Gappy (Giuseppe Paleologo)
Gappy (Giuseppe Paleologo)@__paleologo·
I did not know that, for any random variable x, | mean(x) - median(x) | <= stdev(x) Direct proof:
Gappy (Giuseppe Paleologo) tweet media
English
52
201
2.8K
243K
Alexandr Wang
Alexandr Wang@alexandr_wang·
We are excited to announce that @shengjia_zhao will be the Chief Scientist of Meta Superintelligence Labs! Shengjia is a brilliant scientist who most recently pioneered a new scaling paradigm in his research. He will lead our scientific direction for our team. Let's go 🚀
Alexandr Wang tweet media
English
338
462
8K
3.6M
Casey Chu retweetledi
Jerry Tworek
Jerry Tworek@MillionInt·
To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system
English
43
112
1.3K
172.6K
Towaki Takikawa / 瀧川永遠希
OpenAI agent mode- not very fun to doomwatch since it makes a lot of small obvious mistakes (reminds me of teaching people who don't use computers how to use the file navigator). It gave up using Google Flights after a bit and is now on Kayak
Towaki Takikawa / 瀧川永遠希 tweet media
English
2
1
6
1.8K
Casey Chu
Casey Chu@caseychu9·
We launched ChatGPT Agent today! When tested on a variety of REAL work tasks (expert tasks that might take >10h), we found that its output was human-quality almost 50% of the time Agent puts o3's intelligence into practice - try your work tasks and let us know how it goes!
Casey Chu tweet media
OpenAI@OpenAI

ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.

English
14
7
138
15.2K
Casey Chu
Casey Chu@caseychu9·
@0xTejpal @METR_Evals I'm a big fan of @METR_Evals! They are looking at software engineering tasks graded on objective correctness, while our eval is broader than that (the output is usually docs, sheets, slides), graded relative to a reference human output
English
1
0
3
237
Tejpal Singh
Tejpal Singh@0xTejpal·
@caseychu9 >10hr tasks at 50% reliability seems like a very different picture from what @METR_Evals has shown. Could you elaborate on this discrepancy?
Tejpal Singh tweet media
English
2
0
3
248
Casey Chu retweetledi
Sam Altman
Sam Altman@sama·
watching chatgpt agent use a computer to do complex tasks has been a real "feel the agi" moment for me; something about seeing the computer think, plan, and execute hits different.
English
1.1K
792
12.7K
4.2M
Casey Chu
Casey Chu@caseychu9·
working on bringing that pass@16 number down to pass@1 💪
Epoch AI@EpochAIResearch

@OpenAI We also found that, when allowed 16 tries per problem, ChatGPT agent’s score grew from 27% to 49% on the tier 1-3 set. This suggests that better prompting or scaffolding might result in better performance from current models.

English
1
0
33
2.8K
Casey Chu
Casey Chu@caseychu9·
Great post from @xikun_zhang_, who did a great job making sure collaboration with Agent feels good!
Xikun Zhang 张熙堃@xikun_zhang_

Just launched ChatGPT Agent (sorry GPT-5 waiters, it is coming!), the most capable AI agent model to date! It has been such an honor to be part of a crazy sprint to get this amazing model trained and shipped together with an absolutely gem team (@isafulf , @caseychu9 , @EdwardSun0909 , @josh_tobin_ Yash Kumar and many more)! I am so proud of this project, so I want to share some highlights, personal takes and lessons learned while working on it: 1. Used for research 📕 + actions 💻 + slides generation: Deep Research can do research. Operator can take actions for you. ChatGPT Agent can do both at the same time! E.g. you can ask it to make a plan for a trip to Hawaii, find good deals on hotels and flights, and book them on your behalf using its own computer! It can also generate slides! 2. Power of end-to-end RL: How do we build it? You guess it right! It is us, @OpenAI RL diehards. You are probably tired of hearing about RL scaling. Me, too. But when I feel its power first-hand, its effectiveness and data efficiency still shock me and feel like magic 🪄. 3. First OpenAI model of high biorisk 💀: Not sure this is something I should proud of or not :) For an ex-AI bio PhD researcher like me, this is something a bit personal. One one hand, many of my biomedicine researcher friends tell me that AI agents have significantly helped with their research. On the other hand, such a capable model can amplify the risk of malicious actors building bioweapons. Our safety team has done incredible work to mitigate the risks. 4. Collaboration with users 👪 is core: We want our AI to augment and enhance humans, not to replace them, so we work hard to make the model good at collaborating with the user. You can type a message at anytime to interrupt it and steer it to new directions. The model will always confirm with you before taking actions like buying things for you or deleting a file on your google drive. And the model will ask clarification questions only when it needs more clarity from you! 5. How to generate good slides: As in other cases, writing a well-specified prompt always helps! Also try first telling it to generate a report, then convert the report into slides! 6. Real-world performance > benchmark chasing: One thing outside people may not know about us is how little attention we pay to external benchmarks during the model dev process. We do not focus on hill-climbing on them, and we do not care that much about how we end up on the leaderboard. That said, as a byproduct of our pursuit to great real-world performance and true intelligence, ChatGPT Agent does crush many benchmarks! Wanna learn more? Read our blog linked in the end! In the end, I want to shout out to my amazing team again. These extremely talented and kind people are the reason why OpenAI is constantly making magic like this! ❤️ Also please try ChatGPT Agent and give us feedback! You can reply here in the thread or my DM is open. This is just the start. We will continue working hard towards more and more capable super-human AI agents! 🤖 openai.com/index/introduc…

English
0
0
9
982
Casey Chu retweetledi
OpenAI
OpenAI@OpenAI·
ZXX
641
705
7.6K
2.9M