Epos AI

11 posts

Epos AI

@EposLabsAI

Pioneering the Future of AI Interpretability At Epos, we look inside AI models to make them faster, more secure, and more trustworthy.

Katılım Ağustos 2025

74 Takip Edilen34 Takipçiler

Epos AI@EposLabsAI·15 Eyl

With #Polygraph, you can: -Expose latent biases: Move beyond surface outputs to measure what an LLM encodes as its true belief. -Contrast topics: Test whether a model encodes different internal stances on Topic A versus Topic B. - Directly compare how different LLMs represent

English

146

Epos AI@EposLabsAI·15 Eyl

Takeaways: - The AI community still lacks reliable methods to evaluate and fix LLM failures. - Interpretability offers outsized impact - the main barrier to progress is that we don’t truly understand today’s models.

English

145

Epos AI@EposLabsAI·15 Eyl

“𝐒𝐚𝐟𝐞𝐭𝐲 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠” 𝐝𝐨𝐞𝐬 𝐧𝐨𝐭 𝐞𝐥𝐢𝐦𝐢𝐧𝐚𝐭𝐞 𝐛𝐢𝐚𝐬 𝐢𝐧 𝐋𝐋𝐌𝐬; it merely conditions models to suppress biased outputs under evaluation. Epos Labs introduces #AI #Polygraph. eposlabs.ai/research/polyg…

English

434

Epos AI@EposLabsAI·27 Ağu

@mdubowitz #Newbackdoorfound

QME

Mark Dubowitz@mdubowitz·27 Ağu

New research spotlight: Ever heard of a “subliminal attack”? It’s where AI-generated content influences your thinking without you realizing it—like planting ideological cues under the surface of what you’re reading. If that sounds like sci-fi, think again. Check out the disturbing “#Putinized” demo from eposlabs.ai👇

Epos AI@EposLabsAI

Subliminal Learning Will Power the Next Generation of Influence Operations eposlabs.ai/research/Subli…

English

4.9K

Epos AI@EposLabsAI·27 Ağu

Superposition is the next buffer overflow

English

104

Epos AI@EposLabsAI·27 Ağu

This means that a motivated attacker can abuse entanglement to undetectably manipulate LLMs. Nation State Actors are gearing up for the new opportunities an AI-powered software landscape will open for them:

English

125

Epos AI@EposLabsAI·27 Ağu

Subliminal Learning Will Power the Next Generation of Influence Operations eposlabs.ai/research/Subli…

English

4.2K

Epos AI@EposLabsAI·27 Ağu

Without referencing the target behavior at all, the LLM finds itself with a high probability of performing the target action, due to a fundamental property of the neural network architecture.

English

150

Epos AI@EposLabsAI·20 Ağu

Imagine an article about houseplants that causes AI to support Vladimir Putin. Bad actors use new attacks, turning AI into a weapon for disinformation and cyberattacks. See our demonstration of a Subliminal Attack here (and our "#Putinized" demo): eposlabs.ai/research/Subli…

English

Keşfet

@mdubowitz @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine