Jack

833 posts

Jack

@0ranguchad

Physicist, Ape lover, et al.

เข้าร่วม Nisan 2024

59 กำลังติดตาม36 ผู้ติดตาม

Jack@0ranguchad·3h

@IAlwaysEatCats @addiswag1 Yeah butcher and hunter are both strong as hell

English

𝙜𝙪𝙞𝙡𝙡𝙤𝙩𝙞𝙣𝙖@IAlwaysEatCats·8h

@addiswag1 I sent three butchers and 1 hunter into the future and they killed hitler

English

293

𝙜𝙪𝙞𝙡𝙡𝙤𝙩𝙞𝙣𝙖@IAlwaysEatCats·10h

How life feels when you stop making mewgenics teams based on what’s “Meta” and just do the classes you find fun to play

English

477

7.6K

Jack@0ranguchad·6h

@ml_angelopoulos So is not really CodeArena, but DesignArena

English

374

Anastasios Nikolas Angelopoulos@ml_angelopoulos·7h

Why GPT-5.5 is lower than Claude? The answer is simple: Code Arena currently only supports frontend/web development tasks, where GPT-5.5 is weakest. Full-stack app development and GitHub integration will land in a couple months. Next time we'll be clearer that this leaderboard shows React/FE only, until we ship full-stack apps etc. Thanks for the feedback!

Anastasios Nikolas Angelopoulos tweet media

Arena.ai@arena

GPT-5.5 by @OpenAI is now live in the Arena, landing across multiple leaderboards. Here’s how it ranks by modality: - Code Arena (agentic web dev): #9, a strong +50pt jump over GPT-5.4 - Document Arena (analysis & long-content reasoning): #6, on par with Sonnet 4.6 - Text Arena: #7, Math #3, Instruction Following: #8 - Expert Arena: #5 - Search Arena: #2 - Vision Arena: #5 Strong, well-rounded performance, especially in Code (+50 pts vs GPT-5.4). Congrats to @OpenAI on the release. Full category breakdowns by modality in the thread.

English

414

60.1K

Jack@0ranguchad·6h

@Exiled_axe You set me up you bastard

English

7.1K

Reice Stark@Exiled_axe·1d

Wow what an extreme thing to say for the sake of humor. I wonder what he posted to subvert our expectations

QT Patch@WITT_SZN7

Lane Thomas walkoff and I’ll post my fully erect shaft and balls

English

4.4K

516.5K

Jack@0ranguchad·7h

@synthwavedd Which story do you think they’re gonna run with this time? They’re A/B testing the plans, or this was just an unintended bug?

English

1.2K

leo 🐾@synthwavedd·8h

$20 Claude Pro users will soon no longer be able to use Opus models in Claude Code unless they purchase extra usage 🤦‍♂️

English

207

191

2.9K

506.9K

Jack@0ranguchad·9h

@yuzu_4ever I’d also wager Tim’s poll is larger in scope, given that Twitter is accessible to anyone with internet. I think the US is among the generally higher trust societies that would skew more blue, which is not the case globally.

English

yuzu@yuzu_4ever·9h

@0ranguchad sorry i wasn't being clear. i didn't mean larger in the sense that it had more votes than the twitter poll.

English

yuzu@yuzu_4ever·2d

i would rather die than live in a world only composed of red button pressers.

Tim Urban@waitbutwhy

Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?

English

148

634

30.5K

Jack@0ranguchad·9h

@yuzu_4ever Tim’s poll is way larger? 98k responses versus 2.6k. Also, again, there’s no actual threat of death being tested in this poll.

English

yuzu@yuzu_4ever·9h

@0ranguchad idk in larger studies the gap is much wider. twitter skews a certain way. x.com/davidshor/stat…

David Shor@davidshor

We asked this to a large sample of nationally representative Americans - blue wins by a 3:1 margin!

English

Jack@0ranguchad·9h

@scaling01 $9.64 per task 💀

English

185

Lisan al Gaib@scaling01·10h

The IQ mogging continues on PencilPuzzleBench

Lisan al Gaib@scaling01

two facts - Opus 4.7 is a decent upgrade. if it's worse for you it's a skill issue - GPT-5.5 will absolutely IQ mog Opus 4.7

English

6.5K

Jack@0ranguchad·9h

@yuzu_4ever Which is not to say that I think everyone who’s picking blue is being dishonest, but that the majority of people, when confronting the actual finality of death, will choose to live. I’d predict ~80-90% red in the final poll.

English

Jack@0ranguchad·10h

@yuzu_4ever That Tim’s poll is barely a win for blue when considering the sample size, lack of stakes, and the population being polled should be a very strong indication that red would win IRL. I’d like to say I’d still pick blue, but I think the honest answer is I wouldn’t self-sacrifice.

English

Jack@0ranguchad·10h

@yuzu_4ever Pick it. I would mourn the loss of idealists whom I believe to have been misled, and I definitely think framing the question differently would produce different results (eg the blender hypothetical). Still, I think self preservation is the dominant factor for most humans.

English

Jack@0ranguchad·10h

@yuzu_4ever Ultimately, I’m red because I believe that when *real stakes* are offered, the majority of people will choose self preservation. The finality of oblivion is far more threatening to confront than a Twitter poll. Logic dictates that red is the direct solution, so everyone should

English

Jack@0ranguchad·10h

@yuzu_4ever @keikane_ RIP Claude

English

yuzu@yuzu_4ever·10h

@keikane_ claude pressed blue 😼

English

634

Jack@0ranguchad·11h

@redtachyon Claude’s sycophancy is not new. It’s baked into the model.

English

167

Ariel@redtachyon·12h

Genuine question - how do people get this delusional? Is it some new 4o-ish sycophancy effect from Claude? Anthropic and OpenAI are competitive with each other in SWE/coding. IMO Codex is far better than Claude, but I know smart people who think otherwise, and that's fine. There's no objective winner, they're just comparable products. At the same time, OpenAI also has some bets in image/video generation, world models, who knows what else. I see absolutely no moat that Anthropic has over OpenAI, there's no world in which they "win", EXCEPT for a fast-takeoff ASI that they potentially create before anyone else. (looking forward to being muted - I tend to not block people though)

Zhu Liang@paradite_

I’m muting people on X who don’t understand how far ahead Anthropic is. It’s hurting my brain so much that I had to do this. If you think OpenAI is in anyway better than Anthropic, please just block me to save both us some trouble.

English

5.8K

Jack@0ranguchad·11h

@AcerFur Very exciting time for math research.

English

787