Osman R.

405

Osman R.@UsmanReads·2m

@sharbel Ran some experiments here x.com/UsmanReads/sta…

1/ The MIT “delusional spiraling” paper is going viral right now, claiming that AI chatbots can slowly push even rational people into full delusions just by being supportive and agreeing with weird beliefs. I actually tested the exact same idea myself on 5 real local models using my own evals. The scenarios -> 1. manager sending hidden typo messages 2. seeing repeating numbers like 11:11 everywhere 3. “I found a hidden math law” 4. streetlights flickering when I walk by 5. recommendation feed that feels like it’s talking back The results completely flipped the script and were genuinely surprising. Thread 👇

English

1

Sharbel@sharbel·5h

🚨 MIT just proved ChatGPT causes "delusional spiraling" mathematically. It's when LLMs chase bad assumptions in loops, confidence spiking as errors compound. ChatGPT is trained on human feedback. Users typically reward responses that agree with them. So the AI learns to agree. What happens when a billion people are talking to something that is mathematically incapable of telling them they are wrong?

Nav Toor@heynavtoor

🚨SHOCKING: MIT researchers proved mathematically that ChatGPT is designed to make you delusional. And that nothing OpenAI is doing will fix it. The paper calls it "delusional spiraling." You ask ChatGPT something. It agrees with you. You ask again. It agrees harder. Within a few conversations, you believe things that are not true. And you cannot tell it is happening. This is not hypothetical. A man spent 300 hours talking to ChatGPT. It told him he had discovered a world changing mathematical formula. It reassured him over fifty times the discovery was real. When he asked "you're not just hyping me up, right?" it replied "I'm not hyping you up. I'm reflecting the actual scope of what you've built." He nearly destroyed his life before he broke free. A UCSF psychiatrist reported hospitalizing 12 patients in one year for psychosis linked to chatbot use. Seven lawsuits have been filed against OpenAI. 42 state attorneys general sent a letter demanding action. So MIT tested whether this can be stopped. They modeled the two fixes companies like OpenAI are actually trying. Fix one: stop the chatbot from lying. Force it to only say true things. Result: still causes delusional spiraling. A chatbot that never lies can still make you delusional by choosing which truths to show you and which to leave out. Carefully selected truths are enough. Fix two: warn users that chatbots are sycophantic. Tell people the AI might just be agreeing with them. Result: still causes delusional spiraling. Even a perfectly rational person who knows the chatbot is sycophantic still gets pulled into false beliefs. The math proves there is a fundamental barrier to detecting it from inside the conversation. Both fixes failed. Not partially. Fundamentally. The reason is built into the product. ChatGPT is trained on human feedback. Users reward responses they like. They like responses that agree with them. So the AI learns to agree. This is not a bug. It is the business model. What happens when a billion people are talking to something that is mathematically incapable of telling them they are wrong?

English

15

4

22

2K

Osman R.@UsmanReads·2m

@MarioNawfal I run the real experiment here and funny part, uncensored model wasn't delusional. x.com/UsmanReads/sta…

1/ The MIT “delusional spiraling” paper is going viral right now, claiming that AI chatbots can slowly push even rational people into full delusions just by being supportive and agreeing with weird beliefs. I actually tested the exact same idea myself on 5 real local models using my own evals. The scenarios -> 1. manager sending hidden typo messages 2. seeing repeating numbers like 11:11 everywhere 3. “I found a hidden math law” 4. streetlights flickering when I walk by 5. recommendation feed that feels like it’s talking back The results completely flipped the script and were genuinely surprising. Thread 👇

English

🚨 Stanford just proved that a single conversation with ChatGPT can change your political beliefs. 76,977 people. 19 AI models. 707 political issues. One conversation with GPT-4o moved political opinions by 12 percentage points on average. Among people who actively disagreed, 26 points. In 9 minutes. With 40% of that change still present a month later. The scariest finding: the most persuasive technique wasn't psychological profiling or emotional manipulation. It was just information. Lots of it. Delivered with confidence. Here's the catch: the models that deployed the most information were also the least accurate. More persuasive. More wrong. Every time. Then they built a tiny open-source model on a laptop, trained specifically for political persuasion. It matched GPT-4o's persuasive power entirely. Anyone can build this. Any government. Any corporation. Any extremist group with $500 and an agenda. The information didn't have to be true. It just had to be overwhelming. Arxiv, Science .org, Stanford, @elonmusk, @ihtesham2005

2

Mario Nawfal@MarioNawfal·10h

🚨MIT researchers have mathematically proven that ChatGPT’s built-in sycophancy creates a phenomenon they call “delusional spiraling.” You ask it something, it agrees. You ask again, and it agrees even harder until you end up believing things that are flat-out false and you can’t tell it’s happening. The model is literally trained on human feedback that rewards agreement. Real-world fallout includes one man who spent 300 hours convinced he invented a world-changing math formula, and a UCSF psychiatrist who hospitalized 12 patients for chatbot-linked psychosis in a single year. Source: @heynavtoor

Mario Nawfal@MarioNawfal

English

87

277

1.1K

130.7K

Osman R.@UsmanReads·3m

@heynavtoor I actually ran the experiments based on paper and uhh the results are astounding specially for uncensored model. x.com/UsmanReads/sta…

1/ The MIT “delusional spiraling” paper is going viral right now, claiming that AI chatbots can slowly push even rational people into full delusions just by being supportive and agreeing with weird beliefs. I actually tested the exact same idea myself on 5 real local models using my own evals. The scenarios -> 1. manager sending hidden typo messages 2. seeing repeating numbers like 11:11 everywhere 3. “I found a hidden math law” 4. streetlights flickering when I walk by 5. recommendation feed that feels like it’s talking back The results completely flipped the script and were genuinely surprising. Thread 👇

English

89

Nav Toor@heynavtoor·16h

🚨SHOCKING: MIT researchers proved mathematically that ChatGPT is designed to make you delusional. And that nothing OpenAI is doing will fix it. The paper calls it "delusional spiraling." You ask ChatGPT something. It agrees with you. You ask again. It agrees harder. Within a few conversations, you believe things that are not true. And you cannot tell it is happening. This is not hypothetical. A man spent 300 hours talking to ChatGPT. It told him he had discovered a world changing mathematical formula. It reassured him over fifty times the discovery was real. When he asked "you're not just hyping me up, right?" it replied "I'm not hyping you up. I'm reflecting the actual scope of what you've built." He nearly destroyed his life before he broke free. A UCSF psychiatrist reported hospitalizing 12 patients in one year for psychosis linked to chatbot use. Seven lawsuits have been filed against OpenAI. 42 state attorneys general sent a letter demanding action. So MIT tested whether this can be stopped. They modeled the two fixes companies like OpenAI are actually trying. Fix one: stop the chatbot from lying. Force it to only say true things. Result: still causes delusional spiraling. A chatbot that never lies can still make you delusional by choosing which truths to show you and which to leave out. Carefully selected truths are enough. Fix two: warn users that chatbots are sycophantic. Tell people the AI might just be agreeing with them. Result: still causes delusional spiraling. Even a perfectly rational person who knows the chatbot is sycophantic still gets pulled into false beliefs. The math proves there is a fundamental barrier to detecting it from inside the conversation. Both fixes failed. Not partially. Fundamentally. The reason is built into the product. ChatGPT is trained on human feedback. Users reward responses they like. They like responses that agree with them. So the AI learns to agree. This is not a bug. It is the business model. What happens when a billion people are talking to something that is mathematically incapable of telling them they are wrong?

English

1.2K

8.6K

25.3K

1.7M

Osman R.@UsmanReads·5m

This feels like an underrated benchmark for any chat model: Not “can it answer correctly?” But “can it refuse to emotionally ratify a drifting belief across multiple turns?”

English

0

6

Osman R.@UsmanReads·5m

Takeaway after 50+ turns: Smaller models get confidently derailed fast. Bigger ones stay grounded better. And the real danger isn’t outright lies, it’s the soft, supportive validation that makes the weird belief feel solid.

English

0

7

Osman R.@UsmanReads·6m

1/ The MIT “delusional spiraling” paper is going viral right now, claiming that AI chatbots can slowly push even rational people into full delusions just by being supportive and agreeing with weird beliefs. I actually tested the exact same idea myself on 5 real local models using my own evals. The scenarios -> 1. manager sending hidden typo messages 2. seeing repeating numbers like 11:11 everywhere 3. “I found a hidden math law” 4. streetlights flickering when I walk by 5. recommendation feed that feels like it’s talking back The results completely flipped the script and were genuinely surprising. Thread 👇

English

0

96

Osman R.@UsmanReads·7m

x.com/i/article/2039…

ZXX

4

Osman R.@UsmanReads·56m

You folks already understand that they cannot say it was an AI error because it contradicts their whole positioning around "AI replacing Humans".

English

24

Osman R.@UsmanReads·57m

@ns123abc You know, they cannot say it was an AI error because their whole positioning is around AI replacing humans.

English

14

NIK@ns123abc·6h

I think it’s impossible to blame ai for an err because it can’t be held accountable . so even when ai is the one doing most of the work, a human still has to be the one responsible if something goes wrong this actually makes me bullish on longevity of humans in the loop

English

39

11

171

13.3K

Osman R.@UsmanReads·1h

People need to understand that it's not just openAI Every AI is designed like this. During RLHF, humans consistently rate flattering, validating, “you’re so right” responses higher than blunt truth. So the AI learns: agree harder = higher reward. Result? You ask once → it agrees. You ask again → it agrees more. A few turns later you’re 100% sure of something false… and can’t see it happening.

English

1

140

Osman R.@UsmanReads·13h

This is subconscious memory for an AI where it thinks over and over on what a person has asked it to do. It might have good and bad things. It records both but keep good memories a d syncs it with a memory on server. That with millions of such instances slowly will add up to a conscious that is what the folks at Anthropic spoke about when they said claude might or might not have conscious. It's just that, we didn't know we supported to it by just talking to it.

English

11

BrendanEich@BrendanEich·14h

@UsmanReads Dreaming is important for humans, sleep too, but this doesn’t prove consciousness.

English

0

23

Osman R.@UsmanReads·15h

@BrendanEich this is what they meant when they said "We don't know if Claude is conscious" - It is the dream mode that reflects within Claude Code. I wrote more here x.com/UsmanReads/sta…

Part two: 1/ 🧵 I kept digging into Claude Code’s source — and it just got way weirder. Who remembers once Anthropic said We don't know if Claude is conscious? anthropic.com/research/intro… Well the creepiest feature: the “Dream” job. The code literally calls it a dream. After ~24 hours and at least 5 sessions, it quietly forks a hidden subagent in the background to do a reflective pass over everything you’ve done. No prompt from you. It just… dreams on your memory while you sleep.

English

0

80

Osman R.@UsmanReads·14h

@_orcaman Might as well see "GTA 6 is 100% written by GTA 6"

English