Jack

833 posts

Jack banner
Jack

Jack

@0ranguchad

Physicist, Ape lover, et al.

เข้าร่วม Nisan 2024
59 กำลังติดตาม36 ผู้ติดตาม
𝙜𝙪𝙞𝙡𝙡𝙤𝙩𝙞𝙣𝙖
How life feels when you stop making mewgenics teams based on what’s “Meta” and just do the classes you find fun to play
English
26
31
477
7.6K
Anastasios Nikolas Angelopoulos
Why GPT-5.5 is lower than Claude? The answer is simple: Code Arena currently only supports frontend/web development tasks, where GPT-5.5 is weakest. Full-stack app development and GitHub integration will land in a couple months. Next time we'll be clearer that this leaderboard shows React/FE only, until we ship full-stack apps etc. Thanks for the feedback!
Anastasios Nikolas Angelopoulos tweet media
Arena.ai@arena

GPT-5.5 by @OpenAI is now live in the Arena, landing across multiple leaderboards. Here’s how it ranks by modality: - Code Arena (agentic web dev): #9, a strong +50pt jump over GPT-5.4 - Document Arena (analysis & long-content reasoning): #6, on par with Sonnet 4.6 - Text Arena: #7, Math #3, Instruction Following: #8 - Expert Arena: #5 - Search Arena: #2 - Vision Arena: #5 Strong, well-rounded performance, especially in Code (+50 pts vs GPT-5.4). Congrats to @OpenAI on the release. Full category breakdowns by modality in the thread.

English
26
21
414
60.1K
Jack
Jack@0ranguchad·
@Exiled_axe You set me up you bastard
English
0
0
17
7.1K
Jack
Jack@0ranguchad·
@synthwavedd Which story do you think they’re gonna run with this time? They’re A/B testing the plans, or this was just an unintended bug?
English
0
0
1
1.2K
leo 🐾
leo 🐾@synthwavedd·
$20 Claude Pro users will soon no longer be able to use Opus models in Claude Code unless they purchase extra usage 🤦‍♂️
leo 🐾 tweet media
English
207
191
2.9K
506.9K
Jack
Jack@0ranguchad·
@yuzu_4ever I’d also wager Tim’s poll is larger in scope, given that Twitter is accessible to anyone with internet. I think the US is among the generally higher trust societies that would skew more blue, which is not the case globally.
English
0
0
1
8
yuzu
yuzu@yuzu_4ever·
@0ranguchad sorry i wasn't being clear. i didn't mean larger in the sense that it had more votes than the twitter poll.
English
1
0
0
20
Jack
Jack@0ranguchad·
@yuzu_4ever Tim’s poll is way larger? 98k responses versus 2.6k. Also, again, there’s no actual threat of death being tested in this poll.
English
1
0
1
13
Jack
Jack@0ranguchad·
@yuzu_4ever Which is not to say that I think everyone who’s picking blue is being dishonest, but that the majority of people, when confronting the actual finality of death, will choose to live. I’d predict ~80-90% red in the final poll.
English
1
0
1
18
Jack
Jack@0ranguchad·
@yuzu_4ever That Tim’s poll is barely a win for blue when considering the sample size, lack of stakes, and the population being polled should be a very strong indication that red would win IRL. I’d like to say I’d still pick blue, but I think the honest answer is I wouldn’t self-sacrifice.
English
1
0
1
19
Jack
Jack@0ranguchad·
@yuzu_4ever Pick it. I would mourn the loss of idealists whom I believe to have been misled, and I definitely think framing the question differently would produce different results (eg the blender hypothetical). Still, I think self preservation is the dominant factor for most humans.
English
0
0
1
5
Jack
Jack@0ranguchad·
@yuzu_4ever Ultimately, I’m red because I believe that when *real stakes* are offered, the majority of people will choose self preservation. The finality of oblivion is far more threatening to confront than a Twitter poll. Logic dictates that red is the direct solution, so everyone should
English
1
0
1
11
yuzu
yuzu@yuzu_4ever·
@keikane_ claude pressed blue 😼
yuzu tweet media
English
5
0
11
634
Jack
Jack@0ranguchad·
@redtachyon Claude’s sycophancy is not new. It’s baked into the model.
English
0
0
0
167
Ariel
Ariel@redtachyon·
Genuine question - how do people get this delusional? Is it some new 4o-ish sycophancy effect from Claude? Anthropic and OpenAI are competitive with each other in SWE/coding. IMO Codex is far better than Claude, but I know smart people who think otherwise, and that's fine. There's no objective winner, they're just comparable products. At the same time, OpenAI also has some bets in image/video generation, world models, who knows what else. I see absolutely no moat that Anthropic has over OpenAI, there's no world in which they "win", EXCEPT for a fast-takeoff ASI that they potentially create before anyone else. (looking forward to being muted - I tend to not block people though)
Zhu Liang@paradite_

I’m muting people on X who don’t understand how far ahead Anthropic is. It’s hurting my brain so much that I had to do this. If you think OpenAI is in anyway better than Anthropic, please just block me to save both us some trouble.

English
13
1
80
5.8K
Jack
Jack@0ranguchad·
@AcerFur Very exciting time for math research.
English
0
0
1
787
Acer
Acer@AcerFur·
slowly then all at once....
Acer tweet media
English
8
23
352
25.1K
Jack
Jack@0ranguchad·
@sama GPT-6 wen
Deutsch
0
0
0
9
Sam Altman
Sam Altman@sama·
so fun to see the reception to 5.5! there is almost nothing that feels more gratifying to me than builders saying they find our tools useful.
English
776
140
5.3K
291.2K
Elliot Arledge
Elliot Arledge@elliotarledge·
KernelBench-Hard coming soon.
Elliot Arledge tweet media
English
38
51
1.2K
313.1K