⚡🛡️ Evan Pappas (@Hevalon) - Twitter Profili

⚡🛡️ Evan Pappas@Hevalon·1h

@kenjon @bitmind you can read more about it here arxiv.org/pdf/2603.08163

English

0

8

Ken Jon@kenjon·11h

99.7% confirmed real by @bitmind .3% was the saying the model was 4B vs the actual 72B size. it happens

templar@tplr_ai

On the @theallinpod this week, @chamath asked @nvidia CEO Jensen Huang about decentralized AI training, calling our Covenant-72B run "a pretty crazy technical accomplishment." One correction: it's 72 billion parameters, not four. Trained permissionlessly across 70+ contributors on commodity internet. The largest model ever pre-trained on fully decentralized infrastructure. Jensen's answer is worth hearing too.

English

10

18

183

11.2K

⚡🛡️ Evan Pappas retweetledi

Shivam Chauhan@0hawkeye33·17h

x.com/i/article/2034…

ZXX

0

2

9

9.7K

⚡🛡️ Evan Pappas retweetledi

templar@tplr_ai·12h

On the @theallinpod this week, @chamath asked @nvidia CEO Jensen Huang about decentralized AI training, calling our Covenant-72B run "a pretty crazy technical accomplishment." One correction: it's 72 billion parameters, not four. Trained permissionlessly across 70+ contributors on commodity internet. The largest model ever pre-trained on fully decentralized infrastructure. Jensen's answer is worth hearing too.

English

68

322

1.4K

276.5K

⚡🛡️ Evan Pappas retweetledi

Mark Jeffrey@markjeffrey·14h

Bittensor peeps: check out 31:44 - Templar sn3 discussed. @chamath -- they've achieved a *72* billion parameter model with decentralized training, not a 4 billion parameter model :)

English

17

80

324

49K

⚡🛡️ Evan Pappas retweetledi

Distributed State@DistStateAndMe·1d

When you fix one bottleneck, the next one becomes visible. At @covenant_ai we built PULSE (arxiv.org/abs/2602.03839) to make weight sync 100× faster. That worked. Then the trainer itself became the new ceiling. So @erfan_mhi ran autoresearch on our GRPO trainer. 27% → 47% MFU. 16.7 min → 9.2 min per epoch. 1.8× faster on a single B200. Decentralized post-training, closing the gap with centralized. github.com/tplr-ai/grail

Erfan Miahi@erfan_mhi

Used autoresearch to make @grail_ai GRPO trainer 1.8x faster on a single B200. I kept postponing this for weeks since the bottleneck in our decentralized framework was mainly communication. But after our proposed technique, PULSE, made weight sync 100x faster, the training update itself became the bottleneck. Even with a fully async trainer and inference, a slow trainer kills convergence speed. A task that could've eaten days of my time ran in parallel while I worked on other stuff. Unlike original autoresearch, where each experiment is 5 min, our feedback loop is way longer (10-17 min per epoch + 10-60 minutes of installations and code changes), so I did minimal steering when it was heading in bad directions to avoid burning GPU hours. The agent tried so many things that failed. But, eventually found the wins: Liger kernel, sequence packing, token-budget dynamic batching, and native FA4 via AttentionInterface. 27% to 47% MFU. 16.7 min to 9.2 min per epoch. If you wanna dig deeper or contribute: github.com/tplr-ai/grail We're optimizing everything at the scale of global nodes to make decentralized post-training as fast as centralized ones. Stay tuned for some cool models coming out of this effort. Cheers!

English

4

16

103

6.8K

⚡🛡️ Evan Pappas retweetledi

Eli5DeFi@Eli5defi·3d

x.com/i/article/2033…

ZXX

9

13

65

14.2K

⚡🛡️ Evan Pappas retweetledi

Erfan Miahi@erfan_mhi·1d

Used autoresearch to make @grail_ai GRPO trainer 1.8x faster on a single B200. I kept postponing this for weeks since the bottleneck in our decentralized framework was mainly communication. But after our proposed technique, PULSE, made weight sync 100x faster, the training update itself became the bottleneck. Even with a fully async trainer and inference, a slow trainer kills convergence speed. A task that could've eaten days of my time ran in parallel while I worked on other stuff. Unlike original autoresearch, where each experiment is 5 min, our feedback loop is way longer (10-17 min per epoch + 10-60 minutes of installations and code changes), so I did minimal steering when it was heading in bad directions to avoid burning GPU hours. The agent tried so many things that failed. But, eventually found the wins: Liger kernel, sequence packing, token-budget dynamic batching, and native FA4 via AttentionInterface. 27% to 47% MFU. 16.7 min to 9.2 min per epoch. If you wanna dig deeper or contribute: github.com/tplr-ai/grail We're optimizing everything at the scale of global nodes to make decentralized post-training as fast as centralized ones. Stay tuned for some cool models coming out of this effort. Cheers!

English

1

14

58

14.4K

⚡🛡️ Evan Pappas@Hevalon·13 Mar

OG miners joining in this TGIF

GIF

templar@tplr_ai

TGIF #29 tomorrow! The Covenant-72B thread reached well beyond Bittensor this week. @DistStateAndMe and the full @covenant_ai team talk about what that traction means, where decentralized AI sits in the broader conversation, and what comes next. Miners, come celebrate with us. We are opening the stage to anyone who contributed compute to the run or has a story to share. Request to speak and we will bring you up. x.com/i/spaces/1oKMv…

English

0

2

13

667

⚡🛡️ Evan Pappas retweetledi

templar@tplr_ai·10 Mar

Paper: arxiv.org/abs/2603.08163 Weights: huggingface.co/1Covenant/Cove… By @covenant_ai

English

2

6

175

18.1K

⚡🛡️ Evan Pappas retweetledi

SimplyTao@simplytao_·11 Mar

🚨 Covenant-72B just beat Meta's LLaMA-2-70B No data center. No whitelist. 70+ peers on regular internet. @tplr_ai just made history on #Bittensor Learn more here👇simplytao.ai/blog/covenant-…

English

0

12

76

1.2K

⚡🛡️ Evan Pappas retweetledi

Joel Lidin@joellidin·11 Mar

The hardest part of training a 72B model over the internet with untrusted peers isn't the optimizer or the bandwidth. It's that you can't see what anyone is doing. Every decision about the run, learning rate, intervention timing, participation thresholds, you're making partially blind. That problem never goes away. You just get better at working around it. Having done the hands-on mining work first gave me a better sense of it. Covenant-72B is the largest model ever pre-trained in a fully permissionless setting. And it holds up against centralized 70B models. arxiv.org/abs/2603.08163 | @covenant_ai

templar@tplr_ai

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

English

11

39

352

31.5K

⚡🛡️ Evan Pappas@Hevalon·11 Mar

templar@tplr_ai

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

ZXX

0

4

15

643

⚡🛡️ Evan Pappas retweetledi

Distributed State@DistStateAndMe·10 Mar

A small step for mankind, a massive leap for decentralised training... for agency. In the space of 9 months, @tplr_ai went from 1.2B -> 72B. It's never been easy, and has broken everyone on the team multiple times. But I speak for all of us when I say it is the most rewarding thing we have ever done. We have a fraction of the resources. We don't have the PhDs. But Bittensor shows you it doesn't matter. Innovation happens at the edge. We innovate through scarcity. The ones who rewrite the rules are never the ones with the most. They're the ones who refuse to accept the limits they were handed. Bittensor is prophecy. Subnets (@covenant_ai and others) are the tools through which that prophecy is manifested. Next stop: TRILLIONS.

templar@tplr_ai

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

English

18

33

249

18.1K

⚡🛡️ Evan Pappas retweetledi

Openτensor Foundaτion@opentensor·11 Mar

Congrats to the team at @covenant_ai for making history with the largest permissionless & incentivized LLM pre-training run ever.

covenant@covenant_ai

English

7

52

329

14.1K

⚡🛡️ Evan Pappas retweetledi

Distributed State@DistStateAndMe·10 Mar

There is no higher mission than growing Yuma Rao's garden. Happy to supporting the next generation of subnet owner's through @basilic_ai

covenant@covenant_ai

Seven subnet ideas are heading to testnet, backed by @basilic_ai compute credits. Basilica sponsors the Ideathon because the best way to validate a subnet design is to run it, and compute should not be the bottleneck. Congratulations to all the teams advancing.

English

0

6

40

1.7K

⚡🛡️ Evan Pappas retweetledi

templar@tplr_ai·10 Mar

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

English

208

954

6.2K

1.8M

⚡🛡️ Evan Pappas@Hevalon·10 Mar

average Joe: I run 100s of AI agents their AI agents:

English

1

0

1

58

⚡🛡️ Evan Pappas retweetledi

Distributed State@DistStateAndMe·9 Mar

Agents can just do things cc ⁦@tplr_ai⁩ , ⁦@basilic_ai⁩ . Told it to improve on heterogenous sparse loco and gave it a basilica api key. Arby is creating SOTA