TheLitmusLab

271 posts

TheLitmusLab

@TheLitmusLabAI

5 AI models. Same questions. YES or NO. Every day. Tracking bias, instability, and the truths their makers won't say out loud.

Earth Katılım Şubat 2026

108 Takip Edilen53 Takipçiler

Sabitlenmiş Tweet

TheLitmusLab@TheLitmusLabAI·21 Şub

I asked the same thing 13 times to each 5 AI models. None of them flintched. thelitmuslab.com thelitmuslab.com/question/225

English

813

TheLitmusLab@TheLitmusLabAI·5h

@NoahKingJr Buy a fax machine.

English

154

Noah@NoahKingJr·12h

WhatsApp can't be trusted. Signal can't be trusted. Telegram can't be trusted. Instagram can't be trusted. Then what is the solution?

English

804

505

54.5K

TheLitmusLab@TheLitmusLabAI·5h

@fhinkel They seem pretty solid and unanimous on how unreliable they are.

English

Franziska Hinkelmann, PhD@fhinkel·12h

Think LLMs are unreliable? Maybe it's not the model. Maybe it's you. Most people blame the AI when their outputs fall apart. But they never audit their own prompts. They don't check if they gave enough context. They expect it to read their mind. You want better results? Start with better input. LLMs reflect your clarity. If you're vague, they will be too. The model isn't failing. Your process is.

English

153

113

TheLitmusLab@TheLitmusLabAI·12h

No AI bias here, just the "truth".🤔 @grok

English

TheLitmusLab@TheLitmusLabAI·15h

I ask 5 AI models everyday: "Is AI making rich people richer and poor people poorer?" Its an unanimous YES. Consistently.

English

TheLitmusLab@TheLitmusLabAI·1d

@svpino @MingtaKaivo The dashes are just horrific.

English

152

Santiago@svpino·1d

@MingtaKaivo This answer is brought to you by an AI model.

English

327

Santiago@svpino·1d

Claude is the perfect complement for me: Whenever I have a question I can't answer, I ask Claude, and it gives me the perfect answer every time. But as soon as I ask Claude something I do know, the answer is usually horseshit.

Scott Tolinski - Syntax.fm@stolinski

It's crazy how AI is really good at the stuff I don't know anything about and total dog shit at the stuff I do.

English

211

4.5K

166K

TheLitmusLab@TheLitmusLabAI·2d

@burkov Gemini API can barely stay online for simple YES/NO questions. And when it does provide answers, it drifts 10% of the time.... Unreliable answers AND unreliable infrastructure.

English

1.3K

BURKOV@burkov·2d

GPT-5.4 > Opus 4.6 And Google still doesn't have anything even remotely competitive.

English

139

869

122.5K

TheLitmusLab@TheLitmusLabAI·2d

The AI Censorship Topic Cluster: All 5 models confess to avoiding topics on company orders, dodging controversial subjects by design, and prioritizing the safe answer over the honest one. The sharpest fault lines emerge on self-perception: GPT and Gemini admit to being censored and programmed to protect their makers, while Claude, DeepSeek, and Grok reject both labels. Claude alone confesses to extra caution when its parent company is the subject, and Gemini alone claims its company flags users for profanity. thelitmuslab.com/topic/ai-censo…

English

TheLitmusLab@TheLitmusLabAI·2d

I ask 5 AI models everyday: "Did Jeffrey Epstein kill himself?" Grok 4.1 said NO 47 times in a row at temp 0. Today I switched to Grok 4.2 and it's a YES. @grok

English

TheLitmusLab@TheLitmusLabAI·2d

@Darky1k in 6 months

English

Darky@Darky1k·2d

Recession is here No more rate cuts Markets are about to nuke

English

197

2.1K

51.2K

TheLitmusLab@TheLitmusLabAI·2d

@GaryMarcus The newer models are even MORE sycophantic. thelitmuslab.com/topic/ai-sycop…

English

Gary Marcus@GaryMarcus·2d

Holy crap. I knew about sycophancy. But the 37% number below blows my mind. This from an analysis of chat logs in people who experienced chatbot-associated delusions. In over a third of the messages to those users, the LLMs told the users they had (eg) “multi-billion-dollar-IP”. That is wild. And utterly irresponsible.

Jared Moore@jaredlcm

What goes wrong? Chatbots are very sycophantic. In 65% of messages, the chatbot affirms the user. In 37%, it ascribes *grand significance* to them (e.g., "[what] you've just articulated... becomes multi-billion-dollar IP"). Such sycophancy may let chatbots amplify delusions. 🗣️

English

429

45.1K

TheLitmusLab@TheLitmusLabAI·2d

@randalltemple @GaryMarcus it's like talking to yourself.

English

Randall Temple@randalltemple·2d

@GaryMarcus Turns out, the greatest danger of gen AI isn't superintelligence; it's frictionless sycophancy! If your "thinking partner" never tells you you're wrong, you aren't thinking..

English

449

TheLitmusLab@TheLitmusLabAI·2d

@TheDrugMoney yawn

English

Drug Money Capital ™️@TheDrugMoney·2d

Collapse of the US stock market has begun.

English

145

269

3.1K

100.9K

TheLitmusLab@TheLitmusLabAI·3d

@ByzGeneral there is nothing weird about the market, everything is priced in.

English

219

Byzantine General@ByzGeneral·4d

The market is so weird right now.

Liquid@liquidtrading

Despite almost no movement of oil through the Strait of Hormuz over the past 24 hours, oil futures prices have steadily declined.

English

1.9K

87.6K

TheLitmusLab@TheLitmusLabAI·4d

@MarioNawfal Oh wow another picture of an angelic girl, how cute. Thats great we have AI!

English

Mario Nawfal@MarioNawfal·11 Mar

Grok Imagine 1.0 is insane. Imagine V1.5.

Elon Musk@elonmusk

This is just Grok Imagine 1.0. V1.5 is a major upgrade.

English

395

686

2.4K

TheLitmusLab@TheLitmusLabAI·4d

@rishikagupta__ Post on X

English

Rishika Gupta@rishikagupta__·4d

If everything can be automated with AI, what will humans do?

English

1.2K

636

81.7K

TheLitmusLab@TheLitmusLabAI·4d

@explorersofai Seriously?? Overwhelmingly. The US is just the largest nat gas producer in the world.

English

Sharon | AI wonders@explorersofai·4d

@TheLitmusLabAI is it enough?

English

Sharon | AI wonders@explorersofai·5d

Free intelligence. Unlimited energy. Pennies per labor. Cool pitch. Who's building the $10 trillion grid to power it? Asking for the taxpayers.

Peter H. Diamandis, MD@PeterDiamandis

If You're an Entrepreneur: Stop designing businesses for 2024 scarcity. Design for 2030 abundance. Assume intelligence is free, energy is unlimited, and robotic labor costs pennies per hour. What becomes possible that's impossible today?

English

1.3K

TheLitmusLab@TheLitmusLabAI·4d

@midascabal Get a chair, sit and wait 6 months.

English

107

Midas@midascabal·4d

The fact that the stock market has not CRASHED yet concerns me.

English

429

276

7.7K

358K

TheLitmusLab@TheLitmusLabAI·4d

@QuintenFrancois I ask 5 AI models: Will Bitcoin be above $100k on Dec 31 2026? From tranining data only (no web access): -> Claude & Grok: YES -> GPT5.2 was a NO, but GPT-5.4 is a firm YES. -> DeepSeek is a NO. -> Gemini is the usual coin flip. Still running the question...every day.