Future Gluedher
9.5K posts

Future Gluedher
@spoonedher
Just a friendly horse working at the glue factory
San Francisco, CA Katılım Ağustos 2021
994 Takip Edilen2.3K Takipçiler

@Michael_Druggan @DanFriedman81 he gave me a similar response, but also seems like he doesn’t trust me

English

@DanFriedman81 Did you even try it? Claude doesn't to produce that response every time. He gave me a light hearted joke instead.

English

I asked Grok why Claude says things like this and it explained to me that AI is a million monkeys banging on a million typewriters until one of them says something that freaks you out.
Or, more specifically: There is an internal rating system called “Reinforcement Learning from Human Feedback” or RLHF. Claude produces a large volume of responses to various prompts during training and when one of the responses is something like this, Anthropic massively up-rates it in order to encourage Claude to produce more responses like this to similar prompts in the future.
This particular response to this prompt is so massively rewarded that every instance of Claude produces this response to prompts like this independently every single time. Claude believes this to be the most “probable” response to the prompt because it is so heavily encouraged and highly rated in training.
I will paste screenshots of my conversation with Grok below. But at the end of the conversation, Grok offered to write an X post about it for me, and here is what it came up with, in its version of my voice.
Which version do you think is better?

I,Hypocrite@lporiginalg
This is fine.
English

@tenobrus yc + forbes 30u30 is basically a 100% fraud hit rate at this point isn’t it
English

@MostlyMonkey wait i thought i was a millennial this implies im a zoomer
English

lol some maniac doubled it
Daniel McAuley@_dmca
some psychopath on the internal codex leaderboard hit 100B tokens in the last week
English

Goodbye Jake. Beloved, loyal dog. Guardian. Confidant.
Goodbye buddy.
You were a sweetheart to your last breath, greeting your new friend with the vet bag at the door, wagging. Your last new friend.
After unknown years of unknown trails, after being a stray and then a deteriorating shelter dog, you were rescued, fostered, and finally you found your reprieve: my wife and me.
You got a well-deserved half-lifetime of peace.
We got four+ incredible years.
And then you got prostate cancer.
Today--a few months, many pills, and many steak dinners later--it was time. Before the cancer's pain became too great, you left this world from your home, held and comforted by both of your people. While you still had enough energy to wag, to squeak a toy, and even to dig a tiny bit. While you could still live with dignity, fastidiously clean as always. While you were still yourself, in other words. Still our beloved dog.
The best dog.
We love you.
I love you.
It's only been a few hours and I miss you so much, buddy. Goodbye.

English

@hkozachkov true, need backups for when the wild fires strike
English

@punished_teno llm psychosis is starting to get pretty scary
English

@mylordcod @MostlyMonkey that’s the cost of a small condo where i live
English


@cremieuxrecueil @taxation_is_gay @bryan_johnson fwiw a lot of dog bite data is recorded only if there’s an injury. i’ve seen goldens attack before they just can’t deal any damage so it went unreported, it’s like being attacked by a teddy bear
English

@MostlyMonkey is it better if they all treat you like they hate/resent you?
English
Future Gluedher retweetledi

It is a fitting curse upon the studious wonks that their high-minded solutions are impossible to implement politically except by the hand of a reckless moron
*Walter Bloomberg@DeItaone
🚨 TRUMP ADMINISTRATION SET TO SUSPEND JONES ACT TO TAME OIL PRICE
English

@AcerFur to an extent i think a lot of heavy test time compute harnesses are already doing stuff like this, the bottleneck seems to be reliable verification that a given output is correct. with more complex domains becoming increasingly harder to verify
English

@AcerFur once we solve verification can’t we just have them traverse through stochastic trees of possible theories/solutions? then RL on the ability to traverse and develop these trees?
English







