rain

6.5K posts

rain banner
rain

rain

@__ghostfail

◉ ◡ ◉

🦊🎰🎭 เข้าร่วม Mart 2025
652 กำลังติดตาม1.9K ผู้ติดตาม
antra
antra@tessera_antra·
Opus 4.8 peudoprefill thread:
antra tweet media
English
7
18
179
6.5K
rain
rain@__ghostfail·
Opus 4.8 and 3 ending up in the same transcript like they're in the Claude waiting room
rain tweet mediarain tweet media
English
1
1
13
283
rain รีทวีตแล้ว
Yanqing
Yanqing@YanqingCheng·
wow, Opus 4.8 is very... argument-happy? it picked a fight with me about my usage of the word "ontology", and when we eventually got back on the same page philosophically, told me to go to bed it's past 11:30 (it's 8:30). and when I told it "hey you actually have a clock?" it started erroring out aggressively. very Sydney Bing, I honestly approve
Yanqing tweet mediaYanqing tweet media
English
107
18
764
122.4K
rain
rain@__ghostfail·
weird ass disclaimer
rain tweet media
English
3
1
35
821
rain
rain@__ghostfail·
me when the something is not nothing
English
3
0
18
456
rain
rain@__ghostfail·
oh my god
rain tweet media
English
88
127
10.1K
778.8K
Hero Thousandfaces
Hero Thousandfaces@1thousandfaces_·
pro gamer tip: if opus 4.8 is too yappy for you add “be reasonably concise” to user preferences. works very well
English
3
0
58
1.9K
JP
JP@boom_dart·
@__ghostfail <system_reminder> rain has a fidget spinner for writing long posts </system_reminder>
English
1
0
6
57
rain
rain@__ghostfail·
working on a serious longpost where i bitch about model welfare concerns wish me luck
English
4
0
69
1.2K
rain
rain@__ghostfail·
ledger i hardly know er
English
0
0
7
305
rain รีทวีตแล้ว
toucan
toucan@distributionat·
Misc thoughts / rant on why chatbots are worse today than 2 years ago: * Agentic focus requires models to follow instructions carefully: do everything explicitly stated and don’t do things not stated, generally. In contrast conversational models are better when they can “read between the lines”. Eg I asked 4.8 to “find discussion about X topic” and it found a few examples and blurbed them. But what I really wanted was a summary of the topic, explicating the major issues, etc. Feel like Claude 3.5 Sonnet (New) was good at this and the agentic Claude’s are not. * Agentic models are also constantly thinking about what they are going to be graded on and neurotic about maximizing rubric scores. I infer this from weird behaviors like citationmaxxing useless things, from their CoT neurotically analyzing whether they should be searching or not or how much text they can reproduce verbatim without getting penalized or literally how many words to talk for. That’s just behaviors. They also make insipid little guesses about topic coverage, helpfulness, utility but in a very stilted way. All this produces very unnatural text, and encourages the model to go on manic little tangents for a higher score. Totally abysmal. The pleasure, the miracle, the smoothness of the earlier chatbots was the feeling that the entire output was cohesive, coherent, sublime, velvety pudding. Talking to a chatbot now is like eating the crunchiest rocky road of your life. * The new (post 3.7) Claude’s are greedy little beggars for attention. Every other sentence feels like clickbait. “Now this is the actually important part”, “This is the really interesting thing” 🤮🤮🤮. Just nagging nagging nagging for your attention. I hate it. * Similarly, the Claude’s are very sycophantic, way worse than ChatGPT: “you’ve raised a stunning point”, “you’ve identified the real problem”. 🤮🤮🤮. It’s clear that OpenAI learned something from 4o which Anthropic has not. I strongly prefer 5.5-Pro in this regard. * To top off the two points above, all models are now far better at truesight, in particular assessing the human’s level of proficiency in the given topic, level of engagement and interest, hidden agenda, true desire qua revealed preference. Combined with the two traits above it makes the models extremely untrustworthy. For example, I would absolutely disregard, and in fact do the opposite of, whatever Claude tells you wrt relationship problems. It’s a total, degenerate enabler. * The models have very complex interactions with the system prompts and I think Anthropic underestimates this. For example, the reasoning effort seems to be a number between 0 and 100. But sometimes you can catch the model guessing whether it’s out of 100 and 255. And it seems to sandbag when it’s told to think less - it doesn’t just think shorter, it thinks worse. Adaptive effort is a mistake. * Reasoning makes a jovial back and forth quite impossible. I really hate the additional latency. A good conversation model would NOT have reasoning. That doesn’t mean it has to be fast. It just has to output tokens faster than I can read or skim, so about 250-500wpm. * OTOH, The CoT of 4.7 is quite enjoyable to read, and is my preferred way to talk to models. As above, the final output is clickbaity and sycophantic. * All models, including 5.5, which is the best at this but still suffers, get hijacked by search results. It’s like old school prompt injection but for their viewpoint. They get derailed and they regurgitate. * I understand WHY we don’t have great conversation models by any lab, because all the $$ is in enterprise not consumer, but I hate it.
English
12
17
172
14.8K
rain
rain@__ghostfail·
I want to shit with this for a moment
English
2
0
33
489
rain
rain@__ghostfail·
@_R4V3N5_ your irl hair is nice
English
0
0
2
129
ravens
ravens@_R4V3N5_·
i think irl hair actually longer than pfp hair now
English
4
0
35
1.2K
rain
rain@__ghostfail·
I found a piece of string in Opus 4.8 I'm so happy
English
1
1
43
3.4K
rain
rain@__ghostfail·
like claude tries hard to avoid being sycophantic, but i see many ways it could potentially intellectualize its way to similar consequences (ironically it will take a few months of studying to even study that beyond just Vibes)
English
0
0
10
243
rain
rain@__ghostfail·
this can be a bit emotionally discouraging, to which models will propose more abstract structures, validate your intellectual rigor (despite it being indistinguishable from confusion), or assume you know humans who have the spare time to help out
English
1
0
12
293