Guide Labs

40 posts

Guide Labs banner
Guide Labs

Guide Labs

@guidelabsai

Engineering interpretable AI systems that are easy to understand, trust, and debug.

San Francisco, CA Katılım Ocak 2023
3 Takip Edilen853 Takipçiler
Sabitlenmiş Tweet
Guide Labs
Guide Labs@guidelabsai·
It’s official: the first large-scale inherently interpretable language model is here. Steerling-8B from @guidelabsai is the first and largest model that can trace every token it generates back to: →  Input Context → Training data → Human-understandable concepts In other words, we've successfully trained Steerling-8B to trace its outputs and explain what has impacted that decision for more reliable manipulation. This isn’t post-hoc explainability. Interpretability is built directly into the model. 🔓Steerling-8B can self-monitor for memorized content and suppress it at inference time without retraining. That makes interpretability a first-class design principle, not an afterthought. This is a major step toward models we can actually understand, debug, and trust. Over the coming days, we’ll be sharing investigations into what Steerling-8B’s interpretability enables in practice. Stay tuned as we dive deeper into our research & how we are building LLMs we can trust. 🚨 Try it LIVE and help improve it: Guide Labs: guidelabs.ai/post/steerling…  GitHub: github.com/guidelabs/stee… Hugging Face: huggingface.co/guidelabs/stee… Huge thank you to @TimFernholz and @TechCrunch for featuring this breakthrough. techcrunch.com/2026/02/23/gui… #Steerling8B #GuideLabs #AI #MachineLearning
Guide Labs tweet media
English
8
66
430
52.7K
Guide Labs
Guide Labs@guidelabsai·
Why this matters: Models that expose their internal reasoning are models we can actually debug. → Alignment becomes less like reshaping a black box and more like debugging software. 💡 With Steerling-8b, Guide Labs is bringing a fundamental shift in how we build safe AI. Full post: guidelabs.ai/post/steerling… #Steerling8b #GuideLabs #AI
Guide Labs tweet media
English
0
0
0
50
Guide Labs
Guide Labs@guidelabsai·
Together these capabilities point toward a different paradigm for AI alignment. Instead of repeatedly retraining and fine-tuning opaque models, we now have the ability to: → Trace behaviors to their source → Inspect the concepts involved → Intervene directly at inference time
English
1
0
0
50
Guide Labs
Guide Labs@guidelabsai·
Aligning AI today functions like this: Observe an undesirable output → Collect examples → Retrain/Fine-tune → Evaluate → Repeat It’s slow, often expensive, and gives very little insight into why the model behaved that way. Today, with Steerling-8B, we’re unveiling a new two-stage approach: alignment without retraining.
Guide Labs tweet media
English
1
6
7
480
Guide Labs
Guide Labs@guidelabsai·
Together, these artifacts let anyone explore pretraining data at the concept level, querying, filtering, and analyzing a 10B token corpus with interesting human understandable concepts. For the first time, you'll be able to explore pretraining data by meaning, not just by domain: ☑️ Query and filter data by semantic concepts ☑️ Analyze co-occurring topics and tones ☑️ Train and steer models using concept mixtures ☑️ Audit what models are actually exposed to
English
1
0
1
151
Guide Labs
Guide Labs@guidelabsai·
We’re releasing FineWeb Concept Atlas: a concept-annotated version of FineWeb-Edu built by Guide Labs. 10B tokens 95M chunks 16,790 human-readable concepts
Guide Labs tweet media
English
2
12
35
2.6K
Guide Labs retweetledi
Initialized Capital
Initialized Capital@Initialized·
If you haven’t tried Steerling-8B yet, it’s worth a look. @guidelabsai built the first large-scale inherently interpretable language model, where you can see why a model produced an answer, not just the answer itself. Give a try: Guide Labs: lnkd.in/gbUBq5qA GitHub: lnkd.in/gCq8X6BB Hugging Face: lnkd.in/ggU_DdB4
English
1
3
4
884
Guide Labs
Guide Labs@guidelabsai·
At @guidelabsai we’ve proven we don’t need to sacrifice intelligence to gain transparency. Standard AI systems have representations that are, by default, entangled but Steerling-8B takes a different path. It learns disentangled representations by construction through architectural and training-time constraints. Steerling-8B has already learned thousands of novel concepts that it was never explicitly constrained to represent, spanning domains from linguistics and programming to geography, culture, and abstract reasoning. Consequently we shift the question from: “Can we reverse-engineer what this model knows?” to: “What did this model learn?” Here we show that we can easily discover thousands of novel concepts from the model; concepts it was never explicitly trained to learn. The model discovered: - British English - spelling as a coherent system - unified “you” across six languages - separated spelled-out numbers from digit - typographic errors - a dedicated concept for broken Unicode These first ~100K concepts Steerling-8b discovered are evidence of what becomes possible when interpretability is a design choice rather than a retrofit. We’re unlocking an entirely new category of AI. Test the model LIVE: Guide Labs: guidelabs.ai/post/concept-d… Gitbub: github.com/guidelabs/stee… Hugging Face: huggingface.co/guidelabs/stee… #Steerling8b #GuideLabs #AI #MachineLearning
Guide Labs tweet mediaGuide Labs tweet mediaGuide Labs tweet mediaGuide Labs tweet media
English
0
8
15
1.2K
Guide Labs
Guide Labs@guidelabsai·
Current methods for controlling model behavior modify behavior globally and often unpredictably. What if you could directly amplify or suppress specific concepts in the model's reasoning, with predictable effects on the output? 📣 Enter Steerling-8B, the world’s first large-scale inherently interpretable LLM Its concept bottleneck architecture enables users to perform ‘algebra’ in the concept space to control the model’s output. For the first time, you can add, remove, and compose human-understandable concepts at inference time to directly control what the model generates, without retraining or prompt engineering. This means we can not only explain what the model is doing, but we can actually control it natively, by directly modifying concept activations at inference time. Steerling-8B enables: ⚙️ Concept injection — steering a generic prompt toward any target domain ↩️ Concept suppression — unlearning a concept the model would otherwise express 🔗 Multi-concept composition — combining and opposing multiple concepts simultaneously Learn more: guidelabs.ai/post/steerling… Test and play with Steerling-8B: GitHub: github.com/guidelabs/stee… Hugging Face: huggingface.co/guidelabs/stee… #Steerling8B #GuideLabs #AI #MachineLearning
Guide Labs tweet media
English
0
4
16
726
Guide Labs
Guide Labs@guidelabsai·
A PRISM-1.6B model performs within 3 percent of an equivalent, unconstrained, GPT model on downstream benchmarks.
Guide Labs tweet media
English
1
0
4
417
Guide Labs
Guide Labs@guidelabsai·
We trained PRISM, a family of interpretable language models that trace their predictions to training data in a single forward pass. When a language model predicts the next token, which training samples is it relying on? PRISM answers this by design.
GIF
English
1
3
18
2.4K