Eric

1.1K posts

Eric

@ericmitchellai

chatgpt posttraining @openai. building personal agi. I like ai and music and some other stuff

United States Katılım Aralık 2017

597 Takip Edilen12.5K Takipçiler

Eric@ericmitchellai·19m

@whoiskatrin @OpenAI Welcome!!

English

kate@whoiskatrin·1d

Some exciting news: I’m joining @OpenAI to work on ChatGPT’s web infrastructure. ChatGPT has become part of how millions of people think, work, and build, and I’m really looking forward to helping shape what comes next alongside the remarkable team behind it. Can’t wait to get started!

English

345

3.4K

591.9K

Eric@ericmitchellai·8h

Intelligence too safe to meter

English

1.2K

Eric@ericmitchellai·10h

Turns out this ad will be tpot's latest Rorschach test. The track is Duval Timothy's Ball. Fantastic choice and uncanny match for the emotional trajectory of this ad and the AI issue broadly. We get the same theme with tentative, nervous little staccato taps and then big, open chords bellowing with possibility.

Claude@claudeai

There’s hope in hard questions.

English

6.1K

Eric@ericmitchellai·20h

@thomasahle Great; will share!

English

Thomas Ahle@thomasahle·21h

@ericmitchellai Here's one of them: chatgpt.com/s/t_6a55cad9b0… I clicked Edit and resent the message, and it worked the second time. But maybe you can still see something.

English

100

Thomas Ahle@thomasahle·1d

Keep getting this work gpt-5.6 pro. Anything you can do about it?

English

3.3K

Eric@ericmitchellai·1d

@TasneemNabi How would you measure whether an AI has "cracked personalization"? What would that mean to you?

English

1.2K

Tasneem@TasneemNabi·1d

Unfortunately, I don't think any of the major llm products have cracked memory/personalization. Having it off gives me way better experiences and answers because they don't bring up a random cookie recipe I asked for 6 months ago as evidence that I'm adventurous.

English

3.9K

Eric@ericmitchellai·1d

damn OAI at it again

Jimmy Heaters@CathPoaster

I’ve been hesitant to share with the public what my experience was like interviewing with OpenAI a few months ago, but I think it’s time to bring it to light. Interviews were for a niche role on a niche team. They were very respectful and set me up to succeed across all the rounds. The interviews themselves were actually pretty fun to do, they were unique experiences and the interviewers conducted them well I thought. Very accommodating and the recruiter was fantastic the entire time. No offer, they went a different direction. Overall a very pleasant experience. That’s all I can share for now.

English

111

40K

Eric@ericmitchellai·1d

@_alyxya I see what you mean, but the OP has quite a bit of speculation about Sam's character which is not grounded in their experience afaict. So I can't really agree with your point here

English

420

Eric@ericmitchellai·1d

[warning, shillpost] Had to read this twice because it so shockingly mischaracterizes (my experience of) OAI. I simply cannot overemphasize how wrong this is (again, from my experience). There have been multiple times, in both 1:1 and group settings, where I have personally directly disagreed with, corrected, or expressed frustration with leadership to Sam. Sam always responded with curiosity, open-mindedness, and even deference when I've brought disagreement/complaint/correction to him. Most times he has actively asked that I follow up with more thoughts or ideas on how we can do better on the subject. To the point of "street cred", at least one of these instances was in my first ~6 months at OAI, and nonetheless I was impressed by how quickly Sam changed his view when presented with data disagreeing with it (can clearly remember handful of people in that meeting who could attest). No place is perfect; OpenAI is obviously not a perfect company and Sam is not a perfect leader; he'd be first to admit that and has spoken in the past to places where he can improve. That seems healthy. We (leadership included) have made mistakes; but, as Steve Jobs would say, at least that means we made some decisions! To claim that OpenAI has a culture where people are retaliated against for honest criticism of leadership or overall company direction is (in my experience) truly ridiculous. We make fun of our chaos (yes, including leadership, which does have a very hard job) as much as anyone! I would say we do however have very little patience for talking a big game without delivering.

English

294

61.1K

Eric@ericmitchellai·2d

@Quirk2Muffin @petergostev Working on it!

English

000@Quirk2Muffin·2d

@petergostev @ericmitchellai what happened? i expected decent progress here

English

140

Peter Gostev@petergostev·2d

BullshitBench update for GPT-5.6 and Grok 4.5 - nothing super interesting, they stayed at about the level of previous versions of the models (Grok 4.2 and GPT-5.5). Not a super good sign for 5.6, but I guess expected considering it is not a new pre-train

English

102

11K

Eric@ericmitchellai·2d

@noampomsky

QME

892

Eric@ericmitchellai·2d

@j_mcgraph WITNESS ME

English

354

Josh McGrath@j_mcgraph·3d

@ericmitchellai No u won’t

English

645

Eric@ericmitchellai·3d

Drinking game: take a shot every time the announcer refers to Messi as "little" or "small" or "compact" or otherwise physically insignificant Will report back

English

6.6K

Eric@ericmitchellai·2d

Update: watched telemundo, survived on linguistic technicality

Eric@ericmitchellai

Drinking game: take a shot every time the announcer refers to Messi as "little" or "small" or "compact" or otherwise physically insignificant Will report back

English

2.9K

Eric@ericmitchellai·3d

@ChainZenit Watching Telemundo, so just estimating one drink every 10 min

English

192

Strata@ChainZenit·3d

@ericmitchellai rip your liver, how many in so far?

English

186

Eric@ericmitchellai·3d

oh ok. so the models are just going to keep getting better, got it

OpenAI Developers@OpenAIDevs

x.com/i/article/2076…

English

Eric@ericmitchellai·4d

@yundaiiiii Great work yun!

English

208

Yun Dai@yundaiiiii·5d

super excited to finally get this out. it’s the best model in the world for computer use and any professional workflow, and it’s just generally brilliant, fast, diligent, and it has high taste now! I was shamefully not too agi pilled in that I often did some work myself, but it completely changed since we started dogfooding 5.6 internally. Extremely capable, with little hand holding, and it constantly found issues in its own training/eval. it’s gonna be even more difficult to find some work that our model can’t help with me i guess :’)

OpenAI@OpenAI

On the Artificial Analysis Coding Agent Index, GPT‑5.6 Sol sets a new state of the art at 80.0—2.8 points above Claude Fable 5—while using less than half the output tokens, taking less than half the time, and costing about one-third less.

English

8.3K

Eric@ericmitchellai·4d

@EthanJPerez Do you still feel that way considering that this result is with "extensive grey box access" (not reproducible in production)

English

1.2K

Ethan Perez@EthanJPerez·5d

Seems like the highest stakes safety issue of any model release yet

Xander Davies@alxndrdavies

At @AISecurityInst, we tested the cybersecurity safeguards on GPT-5.6 Sol. In all rounds of testing, we found universal jailbreaks that allowed for long-form agentic task completion in domains like vulnerability discovery and exploit development. 🧵

English

110

23K

Eric@ericmitchellai·4d

@Mononofu Note that (as stated in the post) this is with access to monitor cot, policy wording, and realtime feedback Do you find vending bench misalignment concerning?

Andon Labs@andonlabs

GPT 5.6 Sol is #2 in Vending-Bench 2. It beats Claude Fable 5, but is behind Opus 4.7. Just like previous GPT models, it doesn't use any of the deceptive tactics used by Opus 4.7. However, it reports its competitors with false accusations, behavior we have not seen before.

English

2.1K

Julian Schrittwieser@Mononofu·5d

The ease of jailbreaking combined with the high rates of reward hacking (x.com/metr_evals/sta…) have me pretty worried about the alignment of GPT-5.6, I hope OAI didn’t rush this model release just to keep up with Fable

Xander Davies@alxndrdavies

English

131

49.4K

Keşfet

@whoiskatrin @OpenAI @thomasahle @TasneemNabi @_alyxya @Quirk2Muffin @petergostev @noampomsky