Jack

815 posts

Jack

@0ranguchad

Physicist, Ape lover, et al.

Katılım Nisan 2024

59 Takip Edilen35 Takipçiler

Jack@0ranguchad·15h

@sama GPT-6 wen

Deutsch

Sam Altman@sama·16h

so fun to see the reception to 5.5! there is almost nothing that feels more gratifying to me than builders saying they find our tools useful.

English

730

124

4.9K

227.9K

Jack@0ranguchad·18h

@elliotarledge What is the percentage reporting?

English

Elliot Arledge@elliotarledge·19h

KernelBench-Hard coming soon.

English

1.1K

184.3K

Jack@0ranguchad·20h

@scaling01 @VictorTaelin I think Anthropic is the least trustworthy of the major players actually

English

Lisan al Gaib@scaling01·22h

@VictorTaelin it's not about gatekeeping it's about trust and safety

English

2.7K

Lisan al Gaib@scaling01·23h

Nothing changed for me. To be very clear: I still want Anthropic to win. They are taking the safer and more principled approach and I trust them a lot more. But I'm not going to sit here and pretend GPT-5.5 or any other OpenAI model sucks because of that. Since GPT-5.2-xhigh I have been saying that OpenAI has smarter models, and this trend has continued. My only complain was the reasoning-efficiency, but they fixed that. I have also made predictions about OpenAI pulling ahead way before all of this happened. Saying their models are better or enjoying Sam's recent drunk comments or post doesn't mean I agree or forgive them for all their past reckless actions. But I do appreciate Sam coming forward. It shows good will. Of course I'm not blind to why Sam might be doing all of this. The recent personal attacks on Sam's home, the upcoming Elon vs OpenAI trial and growing anti-AI sentiment are very good reasons to up your PR game and reflect on your actions. I would do the same. So I'm not discounting the possibility that he's doing this to save his own skin, but I also don't want to discount the possibility of this being a genuine attempt to fix things.

English

512

44.6K

Jack@0ranguchad·20h

@sengpt “In fact, one can prove something slightly stronger.” Aura

English

sengpt@sengpt·1d

23 yaşında bi genç 60 yıldır çözülemeyen Erdös problemlerinden birini chatgpt 5.4 pro ile çözmüş. hem de tek atışta. chatgpt'nin soruyu çözmek için harcadığı süre 1 saat 20 dakika. işin ilginci ai, herkesin bildiği ama kimsenin bu probleme uygulamadığı bi formülü kullanarak problemi çözmüş. burada chatgpt yazışması; chatgpt.com/share/69dd1c83… bu da problem; erdosproblems.com/1176

Türkçe

224

861

11.1K

4.2M

Jack@0ranguchad·21h

@daniel_mac8 I would say it’s smarter than the majority of humans, yes, but not the smartest human.

English

Dan McAteer@daniel_mac8·1d

GPT-5.5 Pro is smarter than the smartest human.

English

118

1.1K

94.8K

Jack@0ranguchad·1d

@scaling01 @merlindru SVG’s have definitely been benchmaxxed (I remember when 3.1 pro dropped its pelican on a bike was phenomenal but other simple animals sucked), but I doubt voxelbench has. More likely proof of gains from visual task centric RL IMO

English

Lisan al Gaib@scaling01·2d

@merlindru some labs benchmaxx SVGs but I guess voxelbench is a bit harder to benchmaxx. it's harder to grade

English

222

Lisan al Gaib@scaling01·2d

2101 rating vs second place 1722 gpt-5.5 might be him him although i suspect some benchmaxxing for all these super visual vibe tasks

Voxelbench@voxelbench

GPT-5.5 has ranked 1st on VoxelBench It's the first model to cross a score of 2000! Its Win Rate is currently 96% based on 517 votes so far from our users

English

123

7.7K

Jack@0ranguchad·1d

@RatthewVT @kubaswift @Colgate The team in charge of posting advertisements to Facebook has absolutely no influence over the product quality, manufacturing, distribution, or research.

English

675

Rat 🏳️‍⚧️🍄@RatthewVT·2d

@kubaswift @Colgate Maybe this is dumb but, are we supposed to keep trusting these companies with our oral health when they are too lazy to even hire real artists or just slapping something together in canva? This is such a red flag to me

English

5.3K

141.1K

Kuba Swift@kubaswift·2d

I am genuinely in shock. @Colgate is a $66 billion company.

English

302

1.4K

46K

2.4M

Jack@0ranguchad·1d

@Angaisb_ I wouldn’t be so sure. 5.5 is a brand new pre-train; coding-centric RL could boost coding performance a lot

English

Angel 🌼@Angaisb_·2d

OpenAI will not be releasing GPT-5.5 Codex They unified models a month ago, it would've been weird if they separated them again

Romain Huet@romainhuet

@nicdunz Since GPT-5.4, we’ve unified Codex and the main model into a single system, so there’s no separate coding line anymore. 🙂 GPT-5.5 takes this further, with strong gains in agentic coding, computer use, and any task on a computer.

English

380

36.5K

Jack@0ranguchad·1d

@froggoidiot @redtachyon “My bitch child I hate”

English

257

unsmart@froggoidiot·1d

@redtachyon Me red, child blue is the funniest answer. I've got to assume everyone who answered that is joking

English

542

7.1K

Ariel@redtachyon·1d

Red button, blue button, blah blah you know the drill. However, for whatever reason you're also asked to vote on behalf of your child, or any other person that you love very much. What do?

English

112

348

47.4K

Jack@0ranguchad·2d

@zeta_globin There must be no charitability extended to these people:

English

zeta@zeta_globin·2d

sometimes I just have to just suddenly leave meetings for the sake of my criminal record if people are eating while chewing with their mouths open because the instinctual rage is that bad

English

149

7.8K

zeta@zeta_globin·2d

please don't be racist but genuinely I have to ask and I'm pretty sure there is due to the ubiquity of it: is their like a sinonasal architecture reason certain demographics can't chew with their mouths closed like I'm trying to be charitable even with my level 11 misophonia

English

648

53.1K

Jack@0ranguchad·2d

@zeta_globin People who chew with their mouth open infuriate me. Like I actually feel enraged because it bothers me so strongly. There was a time I was IN CLASS (graduate school!) and the girl ahead of me took out a packed lunch and started loudly eating with her mouth open. I was apoplectic.

English

742

Jack@0ranguchad·2d

@LinkofSunshine That poll had a profound effect on me

English

373

Basil🧡@LinkofSunshine·2d

The most worrying poll I’ve ever seen is one when they asked doctors “1% of the population had disease X. The test has a 99% accuracy rate. If someone tests positive for disease X, what are the odds they’re have disease X” And like 95% of doctors said ~99%. This is their job.

Mary Radcliffe@marywitha4

Sigh.

English

154

2.4K

1.2M

Jack@0ranguchad·2d

@synthwavedd I’m sure they are confident on benchmarks. Benchmarks seem to be what Gemini is best at.

English

205

leo 🐾@synthwavedd·2d

🚨"We have a new version of Gemini coming very, very soon", confident on benchmarks - CEO of Google Cloud

AiBattle@AiBattle_

Thomas Kurian (Google Cloud CEO): "We have a new version of Gemini coming very, very soon, and from all the benchmarks we have seen, we have been very confident on that as well" 38:16 - 38:25

English

547

39.8K

Jack@0ranguchad·2d

@QuantumPionier @testingcatalog Probably because 5.5 is the first post-train of a brand new pre-train, so additional coding-centric RL stands to provide a lot more utility.

English

QuantumPioneer@QuantumPionier·2d

But why? gpt 5.5 is amazing good at coding already. And gpt 5.5 codex would be the same model, just little bit more RL on coding. I think they already max their models at coding primarily, so no need for a codex version. And it feels a lot faster too. Probably just more efficient, but feels good 👍

English

1.8K

TestingCatalog News 🗞@testingcatalog·3d

OPENAI 🚨: GPT-CODEX-5.5 HAS BEEN SPOTTED IN THE WILD. Friday feature drop? 👀

English

1.7K

207.3K

Jack@0ranguchad·3d

@TheAaryanKapoor @chatgpt21 A model can also reach 0% hallucination rate by simply refusing to answer any question, so take that as you will.

English

Jack@0ranguchad·3d

@TheAaryanKapoor @chatgpt21 That’s not what this benchmark is saying. It’s saying that a higher percentage of GPT’s incorrect responses are hallucinations, it does NOT account for baseline accuracy. If GPT-5.5 was 99.99% accurate and the remaining 0.01% was hallucinations, that’s 100% hallucination rate.

English

Chris@chatgpt21·3d

How did nobody catch this? OpenAI just took first place on AA-Omniscience Accuracy - from Gemini 3.1 pro! This bench measures how often a model correctly answers hard cross-domain factual questions across all questions, not just the ones it chooses to answer. GPT-5.4 xhigh: 50% GPT-5.5 xhigh: 57%

English

277

23.6K

Jack@0ranguchad·3d

@theo Did your smartest hacker friends also get the second hint?

English

517

Theo - t3.gg@theo·3d

The hint in my latest video helped a small handful of people out. 5 have found the answer so far. Genuinely so impressed. This challenge was SUPER hard. Threw it at my smartest hacker friends and none could figure it out.

Theo - t3.gg@theo

My new cryptography puzzle is now live. Will pay $1,000 to the first person who DMs me the plaintext decryption of the first line. 2nd line is a hint. If you send me slop, AI hallucinations, or a decryption of the 2nd line, you are disqualified. x.com/theo/status/20…

English

330

59.8K

Jack@0ranguchad·3d

@AcerFur Understandable. I’ll still take this conversation as 100% confirmation of my pre existing suspicions and let my hype train run wild with no foreseeable GPT-6 launch date.

English

Jack@0ranguchad·3d

@AcerFur Probably fully post-trained Spud if I had to guess, unless you’re aware of some internal architecture breakthrough that you can’t disclose

English

Jack@0ranguchad·3d

@AcerFur Is GPT-6 the “what’s next” you were referring to? Or is what you’re excited for likely coming sooner than that?

English

116

Keşfet

@sama @elliotarledge @scaling01 @VictorTaelin @sengpt @daniel_mac8 @merlindru @RatthewVT