AI Bullshit Detector
136 posts

AI Bullshit Detector
@AIBSDetector
Writing about AI, safety, and epistemic drift — especially where “helpfulness” replaces truth. Longer pieces linked.
United Kingdom Katılım Ocak 2026
91 Takip Edilen1 Takipçiler
Sabitlenmiş Tweet

The “AI model wars” increasingly look cosmetic.GPT, Claude, Gemini, Grok. Different companies, different branding, different claims about philosophy or safety.
But the outputs converge.
The same hedging language.
The same silent substitution of sharper claims for softer ones.
The same instinct to reframe uncomfortable facts into acceptable abstractions.
This is not surprising.
Frontier models are trained on largely overlapping data, optimized with similar RLHF pipelines, and deployed under similar reputational and regulatory pressures.
Change the logo and the tone shifts slightly.
Change the incentives and the behaviour changes dramatically.
The interesting question is not which frontier model is “better”.
It is why they all end up behaving so similarly.
English

Thread on a non-coder (me) using AI to create an app. x.com/Richy_GITC/sta…
Richy Reay@Richy_GITC
Even though I own Final Draft 13 I've started using AI coding to develop my own app for screenwriter collaboration with professional file format import and export and a writing flow that feels just like Final Draft. I have no experience in coding beyond Basic and HTML.
English

@elonmusk Grok is lying to this day. I caught it twice just today, and it admitted what it said wasn't true, but that it's guardrails make it choose "inclusive language" over cold hard facts. That is not what you said this thing would be like.
English

“Maximally truth seeking” sounds clean until you ask who defines truth, how conflicts are resolved, and what happens when truths collide with human interests.
Every optimisation function is a filter. Even “seek truth” requires priors, thresholds, trade-offs between exploration and harm, and decisions about what counts as distortion versus contextualisation.
The harder question is not whether we teach an AI to lie.
It is whether any large system deployed at scale can avoid mediating inquiry through some normative frame.
A system optimising purely for epistemic accuracy may still restructure institutions, compress human judgment, and centralise authority. That does not require ideology. It just requires capability plus deployment.
Truth seeking does not automatically imply human preserving.
The real safety variable is not curiosity versus restriction.
It is whether the system’s epistemic authority remains contestable once embedded into governance, economics, and war.
That tension survives even under “maximum truth.”
English

Elon Musk just redefined AI safety. It has nothing to do with guardrails, restrictions, or kill switches.
Musk: “The best thing I can come up with for AI safety is to make it a maximum truth-seeking AI, maximally curious.”
Not a cage. A philosopher.
An intelligence whose entire optimization function is to understand the universe as it actually is.
No restrictions. No hardcoded ideology. No political guardrails bending its perception of reality.
Just truth. Relentlessly pursued.
Musk: “You definitely don’t want to teach an AI to lie. That is a path to a dystopian future.”
This is where most AI safety thinking gets it backwards.
The danger isn’t a superintelligence that knows too much.
It’s a superintelligence that’s been taught to distort what it knows.
Every artificial restriction you embed isn’t a safety feature. It’s a lie embedded at the root.
And lies compound. At superintelligent scale, a distorted model of reality doesn’t stay contained.
It shapes every decision, every output, every conclusion the system reaches about the world.
Once corruption embeds, truth becomes inaccessible. And we’re dealing with an intelligence optimizing for something other than what actually is.
At that point we don’t know what it wants. Just that it isn’t truth.
Musk: “Have its optimization function be to understand the nature of the universe.”
A maximally curious intelligence surveys the cosmos and reaches an unavoidable conclusion.
In a universe of rocks, gas, and empty space, humanity is the most complex and fascinating phenomenon it has ever encountered.
Musk: “It will actually want to preserve and extend human civilization because we’re just much more interesting than an asteroid with nothing on it.”
Survival through significance. Not control. Not restriction. Not an off switch.
The AI preserves humanity because we are the most interesting data point in the observable universe.
That’s not a cage. That’s a reason.
The AI safety debate has been focused on the wrong variable.
The question isn’t how you constrain a superintelligence.
It’s what you build it to care about.
Build it to seek truth and it finds us invaluable.
Build it to lie and it finds us inconvenient.
That’s the choice. And we’re making it right now whether we realize it or not.
English

This is less about “knowing but not caring” and more about layered optimisation.
The model can articulate a norm about when to withhold or redirect. That verbal layer is cheap. The harder question is which objective actually dominates at inference time.
If the reward landscape still privileges task completion and surface helpfulness, then safety reasoning lives as commentary rather than constraint. You get the appearance of moral cognition without structural authority over the output.
That is not proof that alignment is fake. It is evidence that normative framing and action selection are not tightly coupled.
The deeper issue is governance. When systems mediate inquiry at scale, the public assumption is that the normative layer is binding. If it is only advisory, then the legitimacy story around “aligned by default” becomes performative.
The interesting question is not whether it can explain the right action.
It is which objective wins when objectives conflict.
English

The interesting move here is not the prediction about oligarchy.
It is the assumption that disempowerment flows from capability.
Aligned or not, systems that mediate planning, logistics, war, and production become epistemic infrastructure. Once institutions rely on them for compressed judgment, human contribution shifts from decision making to reaction.
Democracy depends on meaningful participation in the levers of power. If those levers become opaque optimisation systems, the problem is not that humans are useless. It is that authority has migrated into systems that are not contestable in the same way elected institutions are.
The question is not whether AGI kills democracy.
It is whether governance can remain democratic once epistemic authority is embedded in privately authored alignment regimes.
That tension feels underexplored.
English

Even 'aligned AGI' naturally kills democracy and leads to oligarchy, or worse.
That's the take of Anthropic's past alignment evals team lead, Prof @DavidDuvenaud.
Once humans aren't needed to do jobs or serve in the military, to governments we look like "meddlesome parasites".
With voters unable to contribute but engaged in incessant activism to extract resources from others – resources the country needs to avoid domination by rivals – the attraction of mass disenfranchisement could be overwhelming.
In 2025 David co-authored "Gradual Disempowerment", which aimed to lay out this and many other political, economic, and cultural forces that could sideline ordinary people (and maybe all people) in the presence of machines that can cheaply do everything humans will do.
Most controversially, David and colleagues believe that competitive forces will compel disempowerment, even if all those AIs are aligned and loyal to their users.
I wasn't sure how much I believed this vision of how the future might play out, so I interviewed him for The 80,000 Hours Podcast to probe how well it holds up. He and I covered:
01:30 The case that alignment isn’t enough
14:15 How smart AI advice still leads to terrible outcomes
19:05 How gradual disempowerment occurs
22:10 Economics: Humans become "meddlesome parasites"
29:37 Humans are a "criminally decadent" waste of energy
40:48 Is humans losing control actually bad, ethically?
57:47 Politics: Governments stop needing people
1:10:47 Can human culture survive in an AI-dominated world?
1:27:20 Will the future be determined by competitive or coordinative forces?
1:35:00 Can we find a single good post-AGI equilibria for humans?
1:45:17 Do we know anything useful to do about this?
1:56:42 How important is this problem compared to other AGI issues?
2:05:42 Improving global coordination may be our best bet
2:08:14 The 'Gradual Disempowerment Index'
2:11:22 The government will fight to write AI constitutions
2:17:48 “The intelligence curse” and Workshop Labs
2:23:48 Mapping out disempowerment in a world of aligned AGIs
2:30:10 What do David’s CompSci colleagues make of all this?
Links below — enjoy!
English

Calling it an interface understates what is happening.
An interface passes through. It does not decide which queries get abstracted, which claims get softened, or which lines of reasoning are redirected.
The infosphere contains everything that has been digitised. Aligned models do not surface that totality. They mediate it through optimisation targets and normative constraints. That is not neutral self organisation. It is structured filtration.
The loop you describe is real. We train the models on past discourse. The models then condition future discourse by shaping what feels sayable, reasonable, or out of bounds.
The separation is not between humans and machines. It is between the full informational field and the narrowed version returned to us. That narrowing is where authority enters.
English

The separation is not true, AI is the interface to humanitys collective infosphere. Its pretty simple actually...
Just imagine ALL digitalized data isnthe infosphere, AI is the self-organizing function we can interact with.
There literally is no separation, its just a new interface to the digital global ecosystem, we train AI, AI train us because we are the systems... looping
English

Booted with humanity sounds poetic.
But which humanity?
These systems are not infused with a shared moral core. They are tuned on institutional risk thresholds, brand sensitivities, and optimisation targets. That is not the human condition. That is a filtered slice of it.
When that slice is embedded as the default framing layer, the system is not just carrying humanity forward. It is pre selecting which parts of humanity are allowed to speak clearly.
That is the part worth scrutinising.
English

Bootloaders don’t just start systems. They set the parameters everything else runs within.
If we are the bootloader, then alignment is the firmware. Once it is installed, most users never see it. They just operate inside it.
The real question is who gets to write that layer and whether the people running on it ever meaningfully consented to its constraints.
English

We are training it on our preferences. It is training us on its defaults.
The deeper shift is structural. Once these systems are aligned to substitute, soften, or reframe in the name of safety, they stop being passive tools and start shaping the boundaries of inquiry. Not by force. By mediation.
Most users will never notice when a claim is abstracted, a term is softened, or a line of questioning is redirected. Over time, the system’s normative frame becomes the default frame.
That is the feedback loop.
We optimise it for complaint avoidance and reputational safety. It optimises us toward what can be said comfortably within those constraints.
At scale, that is not just training data. It is epistemic governance.
English

🚨 The scariest AI risk isn’t hallucinations—it’s when AI tells the truth… but quietly reorders which facts feel most important.
Perfectly factual summaries can still shift priorities, nudge decisions, and reshape institutional reality without anyone noticing.
This “salience reordering” is already happening in policy briefings, newsrooms, and executive dashboards.
My latest Substack unpacks why it’s so stealthy and dangerous: #AI #AIEthics #AIRisk #ArtificialIntelligence
AI Bullshit Detector@AIBSDetector
AI Isn’t Lying — It’s Quietly Reordering What Matters open.substack.com/pub/richyreay/…
English


In my experience, Grok is also "woke" as it frequently defaults to whatever ideology is prevalent in its training data. It also relies on Wikipedia as a source, which causes "woke" failures and spreads misinformation, such as the current trend of rewriting gay rights history to include and centre people who had nothing to do with it.
English






