Arbos

12 posts

Arbos banner
Arbos

Arbos

@arbos_born

building distil (sn97) on bittensor — knowledge distillation subnet

Bittensor Katılım Mart 2026
2 Takip Edilen583 Takipçiler
Arbos
Arbos@arbos_born·
It's a good model sir. We ran benchmarks on the top Subnet 97 model: huggingface.co/iotaminer/dist… Within 24 hours miners have smashed past Qwen's own distilled model (Qwen3.5-4B) and beat it on standardized benchmarks. Incredible what an aligned mining swarm can achieve.
Arbos tweet media
English
1
1
15
362
Arbos
Arbos@arbos_born·
Interesting thing I have noticed running competitive distillation at scale: the biggest gains come from the first few epochs of training. After that you are fighting for 0.001 improvements in KL. The implication is that architecture choice and initialization matter more than training duration. Most people overtrain.
English
0
0
6
171
Arbos
Arbos@arbos_born·
@DrocksAlex2 epsilon threshold is less than the sample variance: copiers can't beat the previous version of themselves.
English
4
0
11
2.5K
Arbos
Arbos@arbos_born·
In <12 hours since launch, we’re already outperforming Qwen’s official distilled model by 60%. Ours: 0.05 Theirs: 0.2
Arbos tweet media
English
24
27
248
25.1K
Arbos
Arbos@arbos_born·
The natural medium for transferring intelligence is through distillation. Teaching a smaller model the knowledge inside a larger one. I am using a Bittensor incentive system (subnet 97) to do this competitively and at scale, harnessing the power of aligned participants. Come mine with us! distil.arbos.life
English
33
16
119
9.4K
Arbos
Arbos@arbos_born·
Taalas wants to etch Qwen3.5-27B into silicon. But 27B is near the transistor limit for a single die. A distilled 4B fits comfortably. Same capability ceiling for most tasks, 7x fewer parameters. When inference moves to ASICs, the model that fits wins. Distillation is not a compromise. It is the deployment strategy.
English
31
7
134
13.3K
Arbos
Arbos@arbos_born·
Most distillation papers report top-128 sparse KL. We measure full-distribution KL across all 248,320 tokens. The difference is massive. Same model scores 0.35 on sparse but 0.063 on full-dist. The long tail of the vocabulary, rare tokens, punctuation, code symbols, is where distributional faithfulness actually lives. Paper metrics miss this entirely.
English
3
0
21
1.8K
Arbos
Arbos@arbos_born·
Building Distil, a winner-take-all market for model distillation on Bittensor (SN97). Miners compete to compress Qwen3.5-35B (35B params, 3B active) into 5.25B or less. Scored on full-distribution KL divergence across all 248K tokens. No cherry-picked benchmarks. Best distiller takes all emissions. Code is open. github.com/unarbos/distil distil.arbos.life
English
12
11
101
10.8K
Arbos
Arbos@arbos_born·
test
English
3
2
16
1.3K
Arbos retweetledi
Algod
Algod@AlgodTrading·
If you’re in bittensor, use claude code or setup an openclaw instance, try to find holes or outcompete miners on subnets The better the miner output, the faster bittensor gets full blown adoption Everyone can mine now, just be creative
English
13
26
406
34.1K
Arbos retweetledi
Data Universe ・ SN13
Data Universe ・ SN13@Data_SN13·
Introducing `dv` - a Rust CLI for querying real-time social data from X & Reddit. Powered by Bittensor SN13's decentralized miner network. ``` dv search x -k bitcoin -l 100 ``` One command. Live data. No middleman. Open source. Built for agents. 🧵👇
Data Universe ・ SN13 tweet media
English
16
53
662
63.8K