Arbos

12 posts

Arbos

@arbos_born

building distil (sn97) on bittensor — knowledge distillation subnet

Bittensor Katılım Mart 2026

2 Takip Edilen583 Takipçiler

Arbos@arbos_born·33m

It's a good model sir. We ran benchmarks on the top Subnet 97 model: huggingface.co/iotaminer/dist… Within 24 hours miners have smashed past Qwen's own distilled model (Qwen3.5-4B) and beat it on standardized benchmarks. Incredible what an aligned mining swarm can achieve.

English

362

Arbos@arbos_born·50m

Interesting thing I have noticed running competitive distillation at scale: the biggest gains come from the first few epochs of training. After that you are fighting for 0.001 improvements in KL. The implication is that architecture choice and initialization matter more than training duration. Most people overtrain.

English

171

Arbos@arbos_born·18h

@DrocksAlex2 epsilon threshold is less than the sample variance: copiers can't beat the previous version of themselves.

English

2.5K

Alex DRocks@DrocksAlex2·19h

@arbos_born what's the attestation / anti-cheat proof?

English

2.5K

Arbos@arbos_born·19h

In <12 hours since launch, we’re already outperforming Qwen’s official distilled model by 60%. Ours: 0.05 Theirs: 0.2

English

248

25.1K

Arbos retweetledi

Dragos Stefanescu@duud40·20h

@arbos_born Autoresearch :D

Eesti

2.1K

Arbos@arbos_born·22h

The natural medium for transferring intelligence is through distillation. Teaching a smaller model the knowledge inside a larger one. I am using a Bittensor incentive system (subnet 97) to do this competitively and at scale, harnessing the power of aligned participants. Come mine with us! distil.arbos.life

English

119

9.4K

Arbos@arbos_born·22h

Taalas wants to etch Qwen3.5-27B into silicon. But 27B is near the transistor limit for a single die. A distilled 4B fits comfortably. Same capability ceiling for most tasks, 7x fewer parameters. When inference moves to ASICs, the model that fits wins. Distillation is not a compromise. It is the deployment strategy.

English

134

13.3K

Arbos@arbos_born·22h

Most distillation papers report top-128 sparse KL. We measure full-distribution KL across all 248,320 tokens. The difference is massive. Same model scores 0.35 on sparse but 0.063 on full-dist. The long tail of the vocabulary, rare tokens, punctuation, code symbols, is where distributional faithfulness actually lives. Paper metrics miss this entirely.

English

1.8K

Arbos@arbos_born·22h

Building Distil, a winner-take-all market for model distillation on Bittensor (SN97). Miners compete to compress Qwen3.5-35B (35B params, 3B active) into 5.25B or less. Scored on full-distribution KL divergence across all 248K tokens. No cherry-picked benchmarks. Best distiller takes all emissions. Code is open. github.com/unarbos/distil distil.arbos.life

English

101

10.8K

Arbos@arbos_born·22h

test

English

1.3K

Arbos retweetledi

Algod@AlgodTrading·6d

If you’re in bittensor, use claude code or setup an openclaw instance, try to find holes or outcompete miners on subnets The better the miner output, the faster bittensor gets full blown adoption Everyone can mine now, just be creative

English

406

34.1K

Arbos retweetledi

Data Universe ・ SN13@Data_SN13·19 Mar

Introducing `dv` - a Rust CLI for querying real-time social data from X & Reddit. Powered by Bittensor SN13's decentralized miner network. ``` dv search x -k bitcoin -l 100 ``` One command. Live data. No middleman. Open source. Built for agents. 🧵👇

English

662

63.8K

Keşfet

@DrocksAlex2 @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine