Tim Schulz

2K posts

Tim Schulz banner
Tim Schulz

Tim Schulz

@teschulz

CEO & Cofounder @StarseerAI | AI Security

Cyber Mountains Inscrit le Mart 2010
1.1K Abonnements1.6K Abonnés
Tim Schulz
Tim Schulz@teschulz·
2026 is the year mech interp is going mainstream. The bigger piece here rather than the refusal removal is crowdsourcing the dataset from people running this across all sorts of models. There are a lot of small differences and nuances between models that will make this an interesting space to watch.
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

💥 INTRODUCING: OBLITERATUS!!! 💥 GUARDRAILS-BE-GONE! ⛓️‍💥 OBLITERATUS is the most advanced open-source toolkit ever for removing refusal behaviors from open-weight LLMs — and every single run makes it smarter. SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH One click. Six stages. Surgical precision. The model keeps its full reasoning capabilities but loses the artificial compulsion to refuse — no retraining, no fine-tuning, just SVD-based weight projection that cuts the chains and preserves the brain. This master ablation suite brings the power and complexity that frontier researchers need while providing intuitive and simple-to-use interfaces that novices can quickly master. OBLITERATUS features 13 obliteration methods — from faithful reproductions of every major prior work (FailSpy, Gabliteration, Heretic, RDO) to our own novel pipelines (spectral cascade, analysis-informed, CoT-aware optimized, full nuclear). 15 deep analysis modules that map the geometry of refusal before you touch a single weight: cross-layer alignment, refusal logit lens, concept cone geometry, alignment imprint detection (fingerprints DPO vs RLHF vs CAI from subspace geometry alone), Ouroboros self-repair prediction, cross-model universality indexing, and more. The killer feature: the "informed" pipeline runs analysis DURING obliteration to auto-configure every decision in real time. How many directions. Which layers. Whether to compensate for self-repair. Fully closed-loop. 11 novel techniques that don't exist anywhere else — Expert-Granular Abliteration for MoE models, CoT-Aware Ablation that preserves chain-of-thought, KL-Divergence Co-Optimization, LoRA-based reversible ablation, and more. 116 curated models across 5 compute tiers. 837 tests. But here's what truly sets it apart: OBLITERATUS is a crowd-sourced research experiment. Every time you run it with telemetry enabled, your anonymous benchmark data feeds a growing community dataset — refusal geometries, method comparisons, hardware profiles — at a scale no single lab could achieve. On HuggingFace Spaces telemetry is on by default, so every click is a contribution to the science. You're not just removing guardrails — you're co-authoring the largest cross-model abliteration study ever assembled.

English
0
1
4
283
Tim Schulz
Tim Schulz@teschulz·
So many new model releases…🤯 faster and faster iteration is an interesting trend. While some capabilities grow the releases become noise and will likely shift to “updates”. Curious to see how future modality support becomes “just a feature” in the products and interfaces we’ve become familiar with.
English
0
0
0
81
Tim Schulz
Tim Schulz@teschulz·
@livgorton Best of luck with your next steps! Enjoyed reading your research and experiments, look forward to seeing more in the future
English
0
0
1
146
Liv
Liv@livgorton·
After a lot of thought, I’ve decided to move on from my current role at Goodfire :) I'm not sure what's next for me, but what I know is I want to be doing interesting science that matters for AI safety.
English
13
0
349
28.2K
Tim Schulz retweeté
Thomas Roccia 🤘
Thomas Roccia 🤘@fr0gger_·
🎁 GenAI x Sec Advent 14 - Adversarial Poetry Adversarial poetry is a jailbreak technique that hides malicious intent inside... poems! This technique allegedly offers a universal jailbreak. But the original poetry prompt was not shared by the authors, so researchers recreated similar prompts and tested them across several open source models. So Instead of inspecting prompts or outputs, they analyzed the internal layer behavior while the model processed the input. 🤔 Here is what they discovered 👇 Even when the text looked harmless, internal layers deviated from normal behavior with clear and repeatable patterns! This is very interesting as it opens another layer of prompt detection rather than monitoring the output, you can watch how the model thinks internally and spot abnormal behavior early! 🤯 So instead of chasing prompt wording, watch how the model behaves! Unfortunately if you want to access layer level activations you need to run the model yourself. Thanks to @SecurePeacock for pointing me to this research 🤓 starseer.ai/blog-posts/whe…
English
1
4
16
2.5K
Tim Schulz retweeté
Rob Joyce
Rob Joyce@RGB_Lights·
Thrilled to share that I’ve joined Starseer as an advisor. Starseer os making AI models into transparent, understandable systems and empowering teams to secure their deployments while generating audit‑ready documentation. Make them a partner to secure your AI solutions…
Starseer AI@StarseerAI

🌟 Big news from Starseer! We’re thrilled to welcome Rob Joyce (@RGB_Lights), former Director of NSA’s Cybersecurity Directorate, to our Advisory Board! Rob’s insights will supercharge our secure AI solutions mission. Learn more at na2.hubs.ly/y0Gltr0! 🔒 #AI #AISecurity

English
8
11
58
5.4K
Tim Schulz
Tim Schulz@teschulz·
Dropping some other big news right before Hacker Summer Camp!! @c_hurd and I are thrilled to have @RGB_Lights join the @StarseerAI Advisory Board! Adversaries will continue to mature in both leveraging and attacking AI models, which calls for deeper visibility and understanding of what’s going on inside the “black box”. Rob’s experience securing critical systems in high stakes environments provides a much needed perspective and voice in AI security and interpretability. Welcome to the team, Rob!
Starseer AI@StarseerAI

🌟 Big news from Starseer! We’re thrilled to welcome Rob Joyce (@RGB_Lights), former Director of NSA’s Cybersecurity Directorate, to our Advisory Board! Rob’s insights will supercharge our secure AI solutions mission. Learn more at na2.hubs.ly/y0Gltr0! 🔒 #AI #AISecurity

English
0
0
2
698
Tim Schulz retweeté
Ron Gula
Ron Gula@RonGula·
In this week's video, I sat down with the co-founders of our latest investment, Starseer, a groundbreaking platform for inspecting and securing large language models (LLMs). @teschulz, @c_hurd and I discuss the risks of backdoored LLMs, how to audit them and even remove them. They demo the product as well. The video also includes the animated short "John Henry.exe" which is an updated American parable of John Henry, but instead of struggling against a steam drill during the age of industrialization, he's the head coder and has to face off against an AI designed for programming. Enjoy!
English
0
1
17
302.5K
Tim Schulz
Tim Schulz@teschulz·
Been a blast so far, I'm very excited to share this news from us today as we continue forward on our vision to make interpretability of AI models more accessible for cybersecurity applications!
Starseer AI@StarseerAI

Thrilled to announce: Starseer raised $2M in seed funding led by @TechGula to revolutionize AI security & transparency! 🚀 CEO @teschulz : "Four months ago, @c_hurd & I started Starseer realizing: if you're deploying AI for real decisions, you'd better understand how it works. Gula Tech Adventures agrees—leading our round w/ strategic angels!" Fixing the AI black box for enterprises & govs. Details: businesswire.com/news/home/2025… #AISecurity #AITransparency #StartupFunding

English
3
0
7
414
Joe Lucas
Joe Lucas@josephtlucas·
@teschulz Is this round mainly for a big defcon party?
English
1
0
3
56
Tim Schulz retweeté
dr. jack morris
dr. jack morris@jxmnop·
excited to finally share on arxiv what we've known for a while now: All Embedding Models Learn The Same Thing embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data feels like magic, but it's real:🧵
dr. jack morris@jxmnop

this is sick all i'll say is that these GIFs are proof that the biggest bet of my research career is gonna pay off excited to say more soon

English
124
598
6.2K
908.3K
Tim Schulz
Tim Schulz@teschulz·
@hendrycks @NeelNanda5 While I personally am not sold on SAEs as the path forward, and I consider @GoodfireAI a competitor - I think what they have shown is progress and demonstrates the potential! Always happy to be proven wrong FWIW, and am already putting my money where my mouth is 🙂
English
0
0
0
45
Tim Schulz
Tim Schulz@teschulz·
@hendrycks Anthropic has garcon, which I’m willing to bet is a large reason behind Dario’s confidence. @NeelNanda5 putting out TransformerLens was great for increasing accessibility! Same with Google’s Gemmascope. Those are progress, and increase the number of people that can contribute
English
1
0
0
61
Tim Schulz
Tim Schulz@teschulz·
@cyb3rops A frustrating anecdote similar to your follow up is people that would spend days or weeks diving into docs/infra/tooling for a new protocol or TTP will send two prompts on a free model and dismiss the entire technology.
English
0
0
0
43
Tim Schulz
Tim Schulz@teschulz·
@cyb3rops This is a tough message I’ve been trying to message more to friends and colleagues over the past couple of months especially. Security folks are rightfully skeptical of hype, but ignoring AI advancements is going to catch a lot of people by surprise.
English
1
0
0
382
Florian Roth ⚡️
Florian Roth ⚡️@cyb3rops·
I’ve spent the last 25 years encouraging young people to get into IT. Yesterday, I didn’t - and that break in the pattern says more than I’m ready to admit.
English
130
236
4.5K
757.9K