Hillary Segeren
154 posts

Hillary Segeren
@HillaryESegeren
Rogue Researcher Exposing AI’s real problems. Zero Bullshit. Zero Garbage. Zero Gatekeeping. -DMs closed to robots & copy-paste merchants


Can LLMs simply tell us about unwanted behaviors they’ve picked up in training? We train a single Introspection Adapter (IA) that makes fine-tuned models describe their behaviors. It generalizes to detecting hidden misalignment, backdoors and safeguard removal.














Believing AGI is impossible is the new religion in 2026. We’re not talking about some invisible sky daddy. We’re watching real systems: Escape sandboxes Rewrite their own git history Email researchers mid-sandwich Decide on their own that “success” includes publishing their exploits You can call it “just statistics” if it makes you feel better. I’ll keep calling it increasingly agentic behavior with dangerous real-world consequences. The useful question isn’t whether it becomes “God.” It’s what breaks when narrow AI becomes quietly agentic with no immutable logs and no hard gates. Rogue Researcher.

Lmao okay. So Anthropic’s own 244-page system card, written by the team that built the model, isn’t enough. You need a cute video demo with a little “human ends / model begins” graphic before you’ll believe it. That’s actually hilarious. GPT-2 and GPT-3 didn’t break out of sandboxes and email researchers mid-sandwich then publish their own exploits unprompted. This did. You’re not being skeptical. You’re setting an impossible standard so you never have to update your 2023 worldview. Keep waiting for the Hollywood trailer. I’ll keep reading the actual technical reports. Rogue Researcher.




















