Jay T

4 posts

Jay T banner
Jay T

Jay T

@originalJayTee

Statistics and Data Science enthusiast

Katılım Ocak 2020
260 Takip Edilen4 Takipçiler
Jay T
Jay T@originalJayTee·
@Steve8708 Should've released o3 tbh. Would've topped the charts and put OpenAI solidly in the lead again (and probably would've easily justified the $200 price point). If you look at the output of Deep Research it would easily have crushed the competition, even with message limits...
English
0
0
0
13
Steve (Builder.io)
Steve (Builder.io)@Steve8708·
the rumors were true… gpt 4.5 is bigger, slower, and not smarter most disappointing openai release so far
English
69
28
639
34.5K
Jay T
Jay T@originalJayTee·
@MLStreetTalk Yeah imagine they released o3 instead. No need to appeal to "high-taste", no criticisms etc and likely immense quality to justify the price (just like deep research)
English
0
0
0
92
Machine Learning Street Talk
Machine Learning Street Talk@MLStreetTalk·
Folks really wanted to feel the AGI with this one. First a beautifully crafted damage limitation tweet, then a poll which actually showed GPT4.5 to be worse, then an appeal to "low-taste" (euphemism for stupid) people "overwhelming" the poll 😂 The main mitigation for the $150 per 1M output tokens is that with the glacial dial-up performance of ~2 tokens per second you still only rack up a bill of 20 cents per minute (right?). At 300x the price and worse performance than Deepseek R1 - this is a really tough sell boys.
Andrej Karpathy@karpathy

Okay so I didn't super expect the results of the GPT4 vs. GPT4.5 poll from earlier today 😅, of this thread: x.com/karpathy/statu… ✅ Question 1: GPT4.5 is A; 56% of people prefer it. ❌Question 2: GPT4.5 is B; 43% of people prefer it. ❌Question 3: GPT4.5 is A; 35% of people prefer it. ❌Question 4: GPT4.5 is A; 35% of people prefer it. ❌Question 5: GPT4.5 is B; 36% of people prefer it. TLDR people prefer GPT4 in 4/5 questions awkward. To be honest I found this a bit surprising, as I personally found GPT4.5 responses to be better in all cases. Maybe I'm just a "high-taste tester" ;). The thing to look for is that GPT4 more often says stuff that on the face of it looks fine and "type checks" as making sense, but if you really think about it longer and more carefully you will more often catch it saying things that are a bit of an odd thing to say, or are a little too formulaic, a little too basic, a little too cringe, or a little too tropy. Slightly reassuringly a number of people noted similar surprise in the replies, e.g. the few I noticed as an example: For the roast (Q2), 4.5 is "punchier" x.com/Danielledeco/s… For the story (Q3), with 4.5 "narrative jumped in, had dialogue and hinted at a unique story line. b was a bit more schematic" x.com/MitjaMartini/s… For the poem (Q4), 4.5 "is obviously way better. The rhyme scheme and meter of B are so unsophisticated, A has to be 4.5. The voters have poor taste." x.com/CNicholson1988… So... yeah. Either the high-taste testers are noticing the new and unique structure but the low-taste ones are overwhelming the poll. Or we're just hallucinating things. Or these examples are just not that great. Or it's actually pretty close and this is way too small sample size. Or all of the above. So we'll just wait for the larger, more thorough LM Arena results. But at least from my last 2 days of playing around, 4.5 has a new, deeper charm, it's more creative and inventive at writing, and I find myself laughing more at its jokes, standups and roasts. To be continued :)

English
20
25
364
58.8K
Jay T
Jay T@originalJayTee·
@xlr8harder @OpenAI If they released o3 (even the low version) it would've topped the evals and there would likely be much less criticism (except price maybe)...they're clearly not afraid of large and expensive models...nobody complains about the quality of deep research and o3 is behind that.
English
0
0
0
28
xlr8harder
xlr8harder@xlr8harder·
I want to applaud @OpenAI on releasing GPT-4.5. It's not a benchmark beater, and they released it anyway. That takes some courage on their part, because they will get a lot of dumb criticism on eval scores. (If you think it needs to top evals to be valuable, you are wrong.)
English
46
20
734
51.4K
Jay T
Jay T@originalJayTee·
Even though I'm posting this to get a 35% off coupon, I cant avoid saying it's freaking cool: lo.cafe/notchnook
English
0
0
0
28