Jack

815 posts

Jack banner
Jack

Jack

@0ranguchad

Physicist, Ape lover, et al.

Katılım Nisan 2024
59 Takip Edilen35 Takipçiler
Jack
Jack@0ranguchad·
@sama GPT-6 wen
Deutsch
0
0
0
8
Sam Altman
Sam Altman@sama·
so fun to see the reception to 5.5! there is almost nothing that feels more gratifying to me than builders saying they find our tools useful.
English
730
124
4.9K
227.9K
Elliot Arledge
Elliot Arledge@elliotarledge·
KernelBench-Hard coming soon.
Elliot Arledge tweet media
English
32
37
1.1K
184.3K
Lisan al Gaib
Lisan al Gaib@scaling01·
Nothing changed for me. To be very clear: I still want Anthropic to win. They are taking the safer and more principled approach and I trust them a lot more. But I'm not going to sit here and pretend GPT-5.5 or any other OpenAI model sucks because of that. Since GPT-5.2-xhigh I have been saying that OpenAI has smarter models, and this trend has continued. My only complain was the reasoning-efficiency, but they fixed that. I have also made predictions about OpenAI pulling ahead way before all of this happened. Saying their models are better or enjoying Sam's recent drunk comments or post doesn't mean I agree or forgive them for all their past reckless actions. But I do appreciate Sam coming forward. It shows good will. Of course I'm not blind to why Sam might be doing all of this. The recent personal attacks on Sam's home, the upcoming Elon vs OpenAI trial and growing anti-AI sentiment are very good reasons to up your PR game and reflect on your actions. I would do the same. So I'm not discounting the possibility that he's doing this to save his own skin, but I also don't want to discount the possibility of this being a genuine attempt to fix things.
English
57
12
512
44.6K
Jack
Jack@0ranguchad·
@sengpt “In fact, one can prove something slightly stronger.” Aura
English
0
0
0
1K
sengpt
sengpt@sengpt·
23 yaşında bi genç 60 yıldır çözülemeyen Erdös problemlerinden birini chatgpt 5.4 pro ile çözmüş. hem de tek atışta. chatgpt'nin soruyu çözmek için harcadığı süre 1 saat 20 dakika. işin ilginci ai, herkesin bildiği ama kimsenin bu probleme uygulamadığı bi formülü kullanarak problemi çözmüş. burada chatgpt yazışması; chatgpt.com/share/69dd1c83… bu da problem; erdosproblems.com/1176
sengpt tweet media
Türkçe
224
861
11.1K
4.2M
Jack
Jack@0ranguchad·
@daniel_mac8 I would say it’s smarter than the majority of humans, yes, but not the smartest human.
English
0
0
1
18
Dan McAteer
Dan McAteer@daniel_mac8·
GPT-5.5 Pro is smarter than the smartest human.
English
118
47
1.1K
94.8K
Jack
Jack@0ranguchad·
@scaling01 @merlindru SVG’s have definitely been benchmaxxed (I remember when 3.1 pro dropped its pelican on a bike was phenomenal but other simple animals sucked), but I doubt voxelbench has. More likely proof of gains from visual task centric RL IMO
English
0
0
2
17
Lisan al Gaib
Lisan al Gaib@scaling01·
@merlindru some labs benchmaxx SVGs but I guess voxelbench is a bit harder to benchmaxx. it's harder to grade
English
1
0
4
222
Jack
Jack@0ranguchad·
@RatthewVT @kubaswift @Colgate The team in charge of posting advertisements to Facebook has absolutely no influence over the product quality, manufacturing, distribution, or research.
English
1
0
0
675
Rat 🏳️‍⚧️🍄
@kubaswift @Colgate Maybe this is dumb but, are we supposed to keep trusting these companies with our oral health when they are too lazy to even hire real artists or just slapping something together in canva? This is such a red flag to me
English
45
82
5.3K
141.1K
Kuba Swift
Kuba Swift@kubaswift·
I am genuinely in shock. @Colgate is a $66 billion company.
Kuba Swift tweet mediaKuba Swift tweet mediaKuba Swift tweet media
English
302
1.4K
46K
2.4M
Jack
Jack@0ranguchad·
@Angaisb_ I wouldn’t be so sure. 5.5 is a brand new pre-train; coding-centric RL could boost coding performance a lot
English
0
0
0
66
Angel 🌼
Angel 🌼@Angaisb_·
OpenAI will not be releasing GPT-5.5 Codex They unified models a month ago, it would've been weird if they separated them again
Romain Huet@romainhuet

@nicdunz Since GPT-5.4, we’ve unified Codex and the main model into a single system, so there’s no separate coding line anymore. 🙂 GPT-5.5 takes this further, with strong gains in agentic coding, computer use, and any task on a computer.

English
17
6
380
36.5K
unsmart
unsmart@froggoidiot·
@redtachyon Me red, child blue is the funniest answer. I've got to assume everyone who answered that is joking
English
16
0
542
7.1K
Ariel
Ariel@redtachyon·
Red button, blue button, blah blah you know the drill. However, for whatever reason you're also asked to vote on behalf of your child, or any other person that you love very much. What do?
English
112
17
348
47.4K
Jack
Jack@0ranguchad·
@zeta_globin There must be no charitability extended to these people:
English
0
0
0
7
zeta
zeta@zeta_globin·
sometimes I just have to just suddenly leave meetings for the sake of my criminal record if people are eating while chewing with their mouths open because the instinctual rage is that bad
English
5
0
149
7.8K
zeta
zeta@zeta_globin·
please don't be racist but genuinely I have to ask and I'm pretty sure there is due to the ubiquity of it: is their like a sinonasal architecture reason certain demographics can't chew with their mouths closed like I'm trying to be charitable even with my level 11 misophonia
English
38
1
648
53.1K
Jack
Jack@0ranguchad·
@zeta_globin People who chew with their mouth open infuriate me. Like I actually feel enraged because it bothers me so strongly. There was a time I was IN CLASS (graduate school!) and the girl ahead of me took out a packed lunch and started loudly eating with her mouth open. I was apoplectic.
English
0
0
2
742
Basil🧡
Basil🧡@LinkofSunshine·
The most worrying poll I’ve ever seen is one when they asked doctors “1% of the population had disease X. The test has a 99% accuracy rate. If someone tests positive for disease X, what are the odds they’re have disease X” And like 95% of doctors said ~99%. This is their job.
Mary Radcliffe@marywitha4

Sigh.

English
154
34
2.4K
1.2M
Jack
Jack@0ranguchad·
@synthwavedd I’m sure they are confident on benchmarks. Benchmarks seem to be what Gemini is best at.
English
0
0
2
205
Jack
Jack@0ranguchad·
@QuantumPionier @testingcatalog Probably because 5.5 is the first post-train of a brand new pre-train, so additional coding-centric RL stands to provide a lot more utility.
English
0
0
2
88
QuantumPioneer
QuantumPioneer@QuantumPionier·
But why? gpt 5.5 is amazing good at coding already. And gpt 5.5 codex would be the same model, just little bit more RL on coding. I think they already max their models at coding primarily, so no need for a codex version. And it feels a lot faster too. Probably just more efficient, but feels good 👍
English
1
0
4
1.8K
TestingCatalog News 🗞
TestingCatalog News 🗞@testingcatalog·
OPENAI 🚨: GPT-CODEX-5.5 HAS BEEN SPOTTED IN THE WILD. Friday feature drop? 👀
TestingCatalog News 🗞 tweet mediaTestingCatalog News 🗞 tweet media
English
59
59
1.7K
207.3K
Jack
Jack@0ranguchad·
@TheAaryanKapoor @chatgpt21 A model can also reach 0% hallucination rate by simply refusing to answer any question, so take that as you will.
English
0
0
0
18
Jack
Jack@0ranguchad·
@TheAaryanKapoor @chatgpt21 That’s not what this benchmark is saying. It’s saying that a higher percentage of GPT’s incorrect responses are hallucinations, it does NOT account for baseline accuracy. If GPT-5.5 was 99.99% accurate and the remaining 0.01% was hallucinations, that’s 100% hallucination rate.
English
1
0
2
42
Chris
Chris@chatgpt21·
How did nobody catch this? OpenAI just took first place on AA-Omniscience Accuracy - from Gemini 3.1 pro! This bench measures how often a model correctly answers hard cross-domain factual questions across all questions, not just the ones it chooses to answer. GPT-5.4 xhigh: 50% GPT-5.5 xhigh: 57%
Chris tweet media
English
9
16
277
23.6K
Jack
Jack@0ranguchad·
@theo Did your smartest hacker friends also get the second hint?
English
0
0
1
517
Theo - t3.gg
Theo - t3.gg@theo·
The hint in my latest video helped a small handful of people out. 5 have found the answer so far. Genuinely so impressed. This challenge was SUPER hard. Threw it at my smartest hacker friends and none could figure it out.
Theo - t3.gg@theo

My new cryptography puzzle is now live. Will pay $1,000 to the first person who DMs me the plaintext decryption of the first line. 2nd line is a hint. If you send me slop, AI hallucinations, or a decryption of the 2nd line, you are disqualified. x.com/theo/status/20…

English
27
4
330
59.8K
Jack
Jack@0ranguchad·
@AcerFur Understandable. I’ll still take this conversation as 100% confirmation of my pre existing suspicions and let my hype train run wild with no foreseeable GPT-6 launch date.
English
0
0
0
47
Jack
Jack@0ranguchad·
@AcerFur Probably fully post-trained Spud if I had to guess, unless you’re aware of some internal architecture breakthrough that you can’t disclose
English
0
0
0
69
Jack
Jack@0ranguchad·
@AcerFur Is GPT-6 the “what’s next” you were referring to? Or is what you’re excited for likely coming sooner than that?
English
0
0
1
116