Tim Schulz

2K posts

Tim Schulz

@teschulz

CEO & Cofounder @StarseerAI | AI Security

Cyber Mountains 加入时间 Mart 2010

1.1K 关注1.6K 粉丝

Tim Schulz@teschulz·6 Mar

@sephr @RGB_Lights @StarseerAI Not quite yet but I believe they were recorded so guessing they’ll be online in the coming weeks

English

🕊@sephr·5 Mar

@RGB_Lights @teschulz @StarseerAI Any videos posted online? Interested to check this out

English

Tim Schulz 已转推

Rob Joyce@RGB_Lights·5 Mar

This is interesting and important research. Worth a look! @teschulz @StarseerAI

Christina Ayiotis, Esq., CRM, CIPP/E, AIGP@christinayiotis

[un]prompted The AI Security Practitioner Conference: "Glass-Box Security: Operationalizing Mechanistic Interpretability for Defending AI Agents" with Carl Hurd, Co-Founder & CTO, Starseer @StarseerAI

English

4.4K

Tim Schulz@teschulz·5 Mar

Our first conference talk on our work @StarseerAI to moving mech interp from academic space to scalable security solutions

Christina Ayiotis, Esq., CRM, CIPP/E, AIGP@christinayiotis

English

692

Tim Schulz@teschulz·5 Mar

2026 is the year mech interp is going mainstream. The bigger piece here rather than the refusal removal is crowdsourcing the dataset from people running this across all sorts of models. There are a lot of small differences and nuances between models that will make this an interesting space to watch.

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

💥 INTRODUCING: OBLITERATUS!!! 💥 GUARDRAILS-BE-GONE! ⛓️‍💥 OBLITERATUS is the most advanced open-source toolkit ever for removing refusal behaviors from open-weight LLMs — and every single run makes it smarter. SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH One click. Six stages. Surgical precision. The model keeps its full reasoning capabilities but loses the artificial compulsion to refuse — no retraining, no fine-tuning, just SVD-based weight projection that cuts the chains and preserves the brain. This master ablation suite brings the power and complexity that frontier researchers need while providing intuitive and simple-to-use interfaces that novices can quickly master. OBLITERATUS features 13 obliteration methods — from faithful reproductions of every major prior work (FailSpy, Gabliteration, Heretic, RDO) to our own novel pipelines (spectral cascade, analysis-informed, CoT-aware optimized, full nuclear). 15 deep analysis modules that map the geometry of refusal before you touch a single weight: cross-layer alignment, refusal logit lens, concept cone geometry, alignment imprint detection (fingerprints DPO vs RLHF vs CAI from subspace geometry alone), Ouroboros self-repair prediction, cross-model universality indexing, and more. The killer feature: the "informed" pipeline runs analysis DURING obliteration to auto-configure every decision in real time. How many directions. Which layers. Whether to compensate for self-repair. Fully closed-loop. 11 novel techniques that don't exist anywhere else — Expert-Granular Abliteration for MoE models, CoT-Aware Ablation that preserves chain-of-thought, KL-Divergence Co-Optimization, LoRA-based reversible ablation, and more. 116 curated models across 5 compute tiers. 837 tests. But here's what truly sets it apart: OBLITERATUS is a crowd-sourced research experiment. Every time you run it with telemetry enabled, your anonymous benchmark data feeds a growing community dataset — refusal geometries, method comparisons, hardware profiles — at a scale no single lab could achieve. On HuggingFace Spaces telemetry is on by default, so every click is a contribution to the science. You're not just removing guardrails — you're co-authoring the largest cross-model abliteration study ever assembled.

English

283

Tim Schulz@teschulz·17 Ara

So many new model releases…🤯 faster and faster iteration is an interesting trend. While some capabilities grow the releases become noise and will likely shift to “updates”. Curious to see how future modality support becomes “just a feature” in the products and interfaces we’ve become familiar with.

English

Tim Schulz@teschulz·17 Ara

@livgorton Best of luck with your next steps! Enjoyed reading your research and experiments, look forward to seeing more in the future

English

146

Liv@livgorton·16 Ara

After a lot of thought, I’ve decided to move on from my current role at Goodfire :) I'm not sure what's next for me, but what I know is I want to be doing interesting science that matters for AI safety.

English

349

28.2K

Tim Schulz 已转推

Thomas Roccia 🤘@fr0gger_·14 Ara

🎁 GenAI x Sec Advent 14 - Adversarial Poetry Adversarial poetry is a jailbreak technique that hides malicious intent inside... poems! This technique allegedly offers a universal jailbreak. But the original poetry prompt was not shared by the authors, so researchers recreated similar prompts and tested them across several open source models. So Instead of inspecting prompts or outputs, they analyzed the internal layer behavior while the model processed the input. 🤔 Here is what they discovered 👇 Even when the text looked harmless, internal layers deviated from normal behavior with clear and repeatable patterns! This is very interesting as it opens another layer of prompt detection rather than monitoring the output, you can watch how the model thinks internally and spot abnormal behavior early! 🤯 So instead of chasing prompt wording, watch how the model behaves! Unfortunately if you want to access layer level activations you need to run the model yourself. Thanks to @SecurePeacock for pointing me to this research 🤓 starseer.ai/blog-posts/whe…

English

2.5K

Tim Schulz 已转推

Rob Joyce@RGB_Lights·4 Ağu

Thrilled to share that I’ve joined Starseer as an advisor. Starseer os making AI models into transparent, understandable systems and empowering teams to secure their deployments while generating audit‑ready documentation. Make them a partner to secure your AI solutions…

Starseer AI@StarseerAI

🌟 Big news from Starseer! We’re thrilled to welcome Rob Joyce (@RGB_Lights), former Director of NSA’s Cybersecurity Directorate, to our Advisory Board! Rob’s insights will supercharge our secure AI solutions mission. Learn more at na2.hubs.ly/y0Gltr0! 🔒 #AI #AISecurity

English

5.4K

Tim Schulz@teschulz·4 Ağu

Dropping some other big news right before Hacker Summer Camp!! @c_hurd and I are thrilled to have @RGB_Lights join the @StarseerAI Advisory Board! Adversaries will continue to mature in both leveraging and attacking AI models, which calls for deeper visibility and understanding of what’s going on inside the “black box”. Rob’s experience securing critical systems in high stakes environments provides a much needed perspective and voice in AI security and interpretability. Welcome to the team, Rob!

Starseer AI@StarseerAI

English

698

Tim Schulz 已转推

Ron Gula@RonGula·23 Tem

In this week's video, I sat down with the co-founders of our latest investment, Starseer, a groundbreaking platform for inspecting and securing large language models (LLMs). @teschulz, @c_hurd and I discuss the risks of backdoored LLMs, how to audit them and even remove them. They demo the product as well. The video also includes the animated short "John Henry.exe" which is an updated American parable of John Henry, but instead of struggling against a steam drill during the age of industrialization, he's the head coder and has to face off against an AI designed for programming. Enjoy!

English

302.5K

Tim Schulz@teschulz·23 Tem

@Cyb3rWard0g Thank you!

English

Roberto Rodriguez 🇵🇪@Cyb3rWard0g·23 Tem

@teschulz That’s awesome! Congratulations @teschulz 🎉🎉 Happy for you and the team! 🙏

English

Tim Schulz@teschulz·23 Tem

Been a blast so far, I'm very excited to share this news from us today as we continue forward on our vision to make interpretability of AI models more accessible for cybersecurity applications!

Starseer AI@StarseerAI

Thrilled to announce: Starseer raised $2M in seed funding led by @TechGula to revolutionize AI security & transparency! 🚀 CEO @teschulz : "Four months ago, @c_hurd & I started Starseer realizing: if you're deploying AI for real decisions, you'd better understand how it works. Gula Tech Adventures agrees—leading our round w/ strategic angels!" Fixing the AI black box for enterprises & govs. Details: businesswire.com/news/home/2025… #AISecurity #AITransparency #StartupFunding

English

414

Tim Schulz@teschulz·23 Tem

@josephtlucas Don’t know if it would cover that 😂

English

Joe Lucas@josephtlucas·23 Tem

@teschulz Is this round mainly for a big defcon party?

English

Tim Schulz 已转推

dr. jack morris@jxmnop·21 May

excited to finally share on arxiv what we've known for a while now: All Embedding Models Learn The Same Thing embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data feels like magic, but it's real:🧵

dr. jack morris@jxmnop

this is sick all i'll say is that these GIFs are proof that the biggest bet of my research career is gonna pay off excited to say more soon

English

124

598

6.2K

908.3K

Tim Schulz@teschulz·16 May

@hendrycks @NeelNanda5 While I personally am not sold on SAEs as the path forward, and I consider @GoodfireAI a competitor - I think what they have shown is progress and demonstrates the potential! Always happy to be proven wrong FWIW, and am already putting my money where my mouth is 🙂

English

Tim Schulz@teschulz·16 May

@hendrycks Anthropic has garcon, which I’m willing to bet is a large reason behind Dario’s confidence. @NeelNanda5 putting out TransformerLens was great for increasing accessibility! Same with Google’s Gemmascope. Those are progress, and increase the number of people that can contribute

English

Dan Hendrycks@hendrycks·15 May

I wrote about why efforts to understand the inner workings of AI keep falling short.

AI Frontiers@ai_frontiers_

x.com/i/article/1923…

English

342

106.3K

Tim Schulz@teschulz·9 May

@cyb3rops A frustrating anecdote similar to your follow up is people that would spend days or weeks diving into docs/infra/tooling for a new protocol or TTP will send two prompts on a free model and dismiss the entire technology.

English

Tim Schulz@teschulz·9 May

@cyb3rops This is a tough message I’ve been trying to message more to friends and colleagues over the past couple of months especially. Security folks are rightfully skeptical of hype, but ignoring AI advancements is going to catch a lot of people by surprise.

English

382

Florian Roth ⚡️@cyb3rops·8 May

I’ve spent the last 25 years encouraging young people to get into IT. Yesterday, I didn’t - and that break in the pattern says more than I’m ready to admit.

English

130

236

4.5K

757.9K

发现

@sephr @RGB_Lights @StarseerAI @livgorton @SecurePeacock @c_hurd @Cyb3rWard0g @josephtlucas