
Jay T
4 posts

Jay T
@originalJayTee
Statistics and Data Science enthusiast



Okay so I didn't super expect the results of the GPT4 vs. GPT4.5 poll from earlier today 😅, of this thread: x.com/karpathy/statu… ✅ Question 1: GPT4.5 is A; 56% of people prefer it. ❌Question 2: GPT4.5 is B; 43% of people prefer it. ❌Question 3: GPT4.5 is A; 35% of people prefer it. ❌Question 4: GPT4.5 is A; 35% of people prefer it. ❌Question 5: GPT4.5 is B; 36% of people prefer it. TLDR people prefer GPT4 in 4/5 questions awkward. To be honest I found this a bit surprising, as I personally found GPT4.5 responses to be better in all cases. Maybe I'm just a "high-taste tester" ;). The thing to look for is that GPT4 more often says stuff that on the face of it looks fine and "type checks" as making sense, but if you really think about it longer and more carefully you will more often catch it saying things that are a bit of an odd thing to say, or are a little too formulaic, a little too basic, a little too cringe, or a little too tropy. Slightly reassuringly a number of people noted similar surprise in the replies, e.g. the few I noticed as an example: For the roast (Q2), 4.5 is "punchier" x.com/Danielledeco/s… For the story (Q3), with 4.5 "narrative jumped in, had dialogue and hinted at a unique story line. b was a bit more schematic" x.com/MitjaMartini/s… For the poem (Q4), 4.5 "is obviously way better. The rhyme scheme and meter of B are so unsophisticated, A has to be 4.5. The voters have poor taste." x.com/CNicholson1988… So... yeah. Either the high-taste testers are noticing the new and unique structure but the low-taste ones are overwhelming the poll. Or we're just hallucinating things. Or these examples are just not that great. Or it's actually pretty close and this is way too small sample size. Or all of the above. So we'll just wait for the larger, more thorough LM Arena results. But at least from my last 2 days of playing around, 4.5 has a new, deeper charm, it's more creative and inventive at writing, and I find myself laughing more at its jokes, standups and roasts. To be continued :)



