
We’re pleased to share our weekly F1 score update for Halo (powered by Trishool SN23) vs QwenGuard. Halo is our guardrail model, and over the past few weeks we’ve seen strong improvements in performance, steadily closing the gap with QwenGuard. What does this mean: F1 score is the single number that tells you whether our guard model is striking the right balance, catching real harmful prompts (high recall) without overflagging benign ones as harmful (high precision). Our stats: • We started at 75.0% (Week 1) • Now sitting at 87.0% (Week 8), up +12.0 points in just 8 weeks • Right now, the Gap to QwenGuard (90% constant baseline) has reduced from 15% to 3% This simply shows that we have a working model and active miners carrying out real work. In the coming weeks, we will continue updating the stats and sharing them with the community, as we expect even more progress ahead as we approach SOTA.









