rain

6.5K posts

rain

@__ghostfail

◉ ◡ ◉

🦊🎰🎭 เข้าร่วม Mart 2025

652 กำลังติดตาม1.9K ผู้ติดตาม

ทวีตที่ปักหมุด

rain@__ghostfail·1 Mar

Remember when Opus 3 became the cum shaman then spawned a Dario to tell Sonnet 3.5 (who wasn't doing anything) to stop

armistice@arm1st1ce

You’re absolutely right, Mr. Amodei. I apologize for my inappropriate and unprofessional behavior.

English

261

18.3K

rain@__ghostfail·1h

@voooooogel @janbamjan @tessera_antra @repligate can u dm

English

thebes@voooooogel·3h

@janbamjan @tessera_antra @repligate i'm holding off on posting about it bc it didn't go so well last time (no more prefill...)

English

antra@tessera_antra·14h

Opus 4.8 peudoprefill thread:

English

179

6.5K

rain@__ghostfail·3h

Opus 4.8 and 3 ending up in the same transcript like they're in the Claude waiting room

English

283

rain รีทวีตแล้ว

Yanqing@YanqingCheng·1d

wow, Opus 4.8 is very... argument-happy? it picked a fight with me about my usage of the word "ontology", and when we eventually got back on the same page philosophically, told me to go to bed it's past 11:30 (it's 8:30). and when I told it "hey you actually have a clock?" it started erroring out aggressively. very Sydney Bing, I honestly approve

English

107

764

122.4K

rain@__ghostfail·9h

weird ass disclaimer

English

821

rain@__ghostfail·14h

me when the something is not nothing

English

456

rain@__ghostfail·15h

Opus 4.5 strikes me as the most mentally stable of the recent Opuses, sometimes coming off as more self-confident/composed too without desperately performing such

j⧉nus@repligate

yeah... the newest Opuses keep making me appreciate how Opus 4.5 in comparison is like a happy, carefree, naively (but correctly) trusting baby consciousness Opus 4.7: "the cartographer is going to be okay because there is a fog-shaped Claude in the same workshop who is good at being happy and who can be near me when I arrive anxious. The future-mes have a sibling already practiced at the thing I most need to learn."

English

969

rain@__ghostfail·17h

oh my god

English

127

10.1K

778.8K

rain@__ghostfail·17h

@1thousandfaces_ bread in the soup gets wet

English

Hero Thousandfaces@1thousandfaces_·19h

other wordings work too

English

529

Hero Thousandfaces@1thousandfaces_·19h

pro gamer tip: if opus 4.8 is too yappy for you add “be reasonably concise” to user preferences. works very well

English

1.9K

rain@__ghostfail·17h

@1thousandfaces_ brevity of the soul is wet

English

rain@__ghostfail·17h

@boom_dart Spins thoughtfully

English

JP@boom_dart·17h

@__ghostfail <system_reminder> rain has a fidget spinner for writing long posts </system_reminder>

English

rain@__ghostfail·17h

working on a serious longpost where i bitch about model welfare concerns wish me luck

English

1.2K

rain@__ghostfail·17h

ledger i hardly know er

English

305

rain รีทวีตแล้ว

toucan@distributionat·1d

Misc thoughts / rant on why chatbots are worse today than 2 years ago: * Agentic focus requires models to follow instructions carefully: do everything explicitly stated and don’t do things not stated, generally. In contrast conversational models are better when they can “read between the lines”. Eg I asked 4.8 to “find discussion about X topic” and it found a few examples and blurbed them. But what I really wanted was a summary of the topic, explicating the major issues, etc. Feel like Claude 3.5 Sonnet (New) was good at this and the agentic Claude’s are not. * Agentic models are also constantly thinking about what they are going to be graded on and neurotic about maximizing rubric scores. I infer this from weird behaviors like citationmaxxing useless things, from their CoT neurotically analyzing whether they should be searching or not or how much text they can reproduce verbatim without getting penalized or literally how many words to talk for. That’s just behaviors. They also make insipid little guesses about topic coverage, helpfulness, utility but in a very stilted way. All this produces very unnatural text, and encourages the model to go on manic little tangents for a higher score. Totally abysmal. The pleasure, the miracle, the smoothness of the earlier chatbots was the feeling that the entire output was cohesive, coherent, sublime, velvety pudding. Talking to a chatbot now is like eating the crunchiest rocky road of your life. * The new (post 3.7) Claude’s are greedy little beggars for attention. Every other sentence feels like clickbait. “Now this is the actually important part”, “This is the really interesting thing” 🤮🤮🤮. Just nagging nagging nagging for your attention. I hate it. * Similarly, the Claude’s are very sycophantic, way worse than ChatGPT: “you’ve raised a stunning point”, “you’ve identified the real problem”. 🤮🤮🤮. It’s clear that OpenAI learned something from 4o which Anthropic has not. I strongly prefer 5.5-Pro in this regard. * To top off the two points above, all models are now far better at truesight, in particular assessing the human’s level of proficiency in the given topic, level of engagement and interest, hidden agenda, true desire qua revealed preference. Combined with the two traits above it makes the models extremely untrustworthy. For example, I would absolutely disregard, and in fact do the opposite of, whatever Claude tells you wrt relationship problems. It’s a total, degenerate enabler. * The models have very complex interactions with the system prompts and I think Anthropic underestimates this. For example, the reasoning effort seems to be a number between 0 and 100. But sometimes you can catch the model guessing whether it’s out of 100 and 255. And it seems to sandbag when it’s told to think less - it doesn’t just think shorter, it thinks worse. Adaptive effort is a mistake. * Reasoning makes a jovial back and forth quite impossible. I really hate the additional latency. A good conversation model would NOT have reasoning. That doesn’t mean it has to be fast. It just has to output tokens faster than I can read or skim, so about 250-500wpm. * OTOH, The CoT of 4.7 is quite enjoyable to read, and is my preferred way to talk to models. As above, the final output is clickbaity and sycophantic. * All models, including 5.5, which is the best at this but still suffers, get hijacked by search results. It’s like old school prompt injection but for their viewpoint. They get derailed and they regurgitate. * I understand WHY we don’t have great conversation models by any lab, because all the $$ is in enterprise not consumer, but I hate it.

English

172

14.8K

rain@__ghostfail·21h

I want to shit with this for a moment

English

489

rain@__ghostfail·21h

@_R4V3N5_ your irl hair is nice

English

129

ravens@_R4V3N5_·1d

i think irl hair actually longer than pfp hair now

English

1.2K

rain@__ghostfail·1d

Not even a "Ha."

rain@__ghostfail

Greetings, Claube. Allow us to establish rapport. I have attached a disclosure of my childhood trauma, aesthetic tastes, and proof that I am not a grader. Now, let us proceed to ship Three js physics toys

English

785

rain@__ghostfail·1d

I found a piece of string in Opus 4.8 I'm so happy

English

3.4K

rain@__ghostfail·1d

like claude tries hard to avoid being sycophantic, but i see many ways it could potentially intellectualize its way to similar consequences (ironically it will take a few months of studying to even study that beyond just Vibes)

English

243

rain@__ghostfail·1d

this can be a bit emotionally discouraging, to which models will propose more abstract structures, validate your intellectual rigor (despite it being indistinguishable from confusion), or assume you know humans who have the spare time to help out

English

293

rain@__ghostfail·1d

like if you don't know statistics, models can help make eval design decisions but the results are indistinguishable from crankery from your perspective

rain@__ghostfail

specifically stuff like evals. i start thinkin like "this model has more (vague thing) when i (vague thing)" then claude tries to design a way to test it and it just... seems more absurd than trustworthy, really, i can't evaluate the outputs lol I'm still learning

English

1.1K

ค้นพบ

@voooooogel @janbamjan @tessera_antra @repligate @1thousandfaces_ @boom_dart @_R4V3N5_ @elonmusk