⚡🛡️ Evan Pappas

6.7K posts

⚡🛡️ Evan Pappas banner
⚡🛡️ Evan Pappas

⚡🛡️ Evan Pappas

@Hevalon

🛡️ Ex Technologia Libertas - Έλευθερία διὰ τῆς τέχνης - (Dec/Acc)

epappas.eth Katılım Eylül 2009
4K Takip Edilen1.3K Takipçiler
Ken Jon
Ken Jon@kenjon·
99.7% confirmed real by @bitmind .3% was the saying the model was 4B vs the actual 72B size. it happens
Ken Jon tweet media
templar@tplr_ai

On the @theallinpod this week, @chamath asked @nvidia CEO Jensen Huang about decentralized AI training, calling our Covenant-72B run "a pretty crazy technical accomplishment." One correction: it's 72 billion parameters, not four. Trained permissionlessly across 70+ contributors on commodity internet. The largest model ever pre-trained on fully decentralized infrastructure. Jensen's answer is worth hearing too.

English
10
18
183
11.2K
⚡🛡️ Evan Pappas retweetledi
templar
templar@tplr_ai·
On the @theallinpod this week, @chamath asked @nvidia CEO Jensen Huang about decentralized AI training, calling our Covenant-72B run "a pretty crazy technical accomplishment." One correction: it's 72 billion parameters, not four. Trained permissionlessly across 70+ contributors on commodity internet. The largest model ever pre-trained on fully decentralized infrastructure. Jensen's answer is worth hearing too.
English
68
322
1.4K
276.5K
⚡🛡️ Evan Pappas retweetledi
Mark Jeffrey
Mark Jeffrey@markjeffrey·
Bittensor peeps: check out 31:44 - Templar sn3 discussed. @chamath -- they've achieved a *72* billion parameter model with decentralized training, not a 4 billion parameter model :)
English
17
80
324
49K
⚡🛡️ Evan Pappas retweetledi
Distributed State
Distributed State@DistStateAndMe·
When you fix one bottleneck, the next one becomes visible. At @covenant_ai we built PULSE (arxiv.org/abs/2602.03839) to make weight sync 100× faster. That worked. Then the trainer itself became the new ceiling. So @erfan_mhi ran autoresearch on our GRPO trainer. 27% → 47% MFU. 16.7 min → 9.2 min per epoch. 1.8× faster on a single B200. Decentralized post-training, closing the gap with centralized. github.com/tplr-ai/grail
Erfan Miahi@erfan_mhi

Used autoresearch to make @grail_ai GRPO trainer 1.8x faster on a single B200. I kept postponing this for weeks since the bottleneck in our decentralized framework was mainly communication. But after our proposed technique, PULSE, made weight sync 100x faster, the training update itself became the bottleneck. Even with a fully async trainer and inference, a slow trainer kills convergence speed. A task that could've eaten days of my time ran in parallel while I worked on other stuff. Unlike original autoresearch, where each experiment is 5 min, our feedback loop is way longer (10-17 min per epoch + 10-60 minutes of installations and code changes), so I did minimal steering when it was heading in bad directions to avoid burning GPU hours. The agent tried so many things that failed. But, eventually found the wins: Liger kernel, sequence packing, token-budget dynamic batching, and native FA4 via AttentionInterface. 27% to 47% MFU. 16.7 min to 9.2 min per epoch. If you wanna dig deeper or contribute: github.com/tplr-ai/grail We're optimizing everything at the scale of global nodes to make decentralized post-training as fast as centralized ones. Stay tuned for some cool models coming out of this effort. Cheers!

English
4
16
103
6.8K
⚡🛡️ Evan Pappas retweetledi
Erfan Miahi
Erfan Miahi@erfan_mhi·
Used autoresearch to make @grail_ai GRPO trainer 1.8x faster on a single B200. I kept postponing this for weeks since the bottleneck in our decentralized framework was mainly communication. But after our proposed technique, PULSE, made weight sync 100x faster, the training update itself became the bottleneck. Even with a fully async trainer and inference, a slow trainer kills convergence speed. A task that could've eaten days of my time ran in parallel while I worked on other stuff. Unlike original autoresearch, where each experiment is 5 min, our feedback loop is way longer (10-17 min per epoch + 10-60 minutes of installations and code changes), so I did minimal steering when it was heading in bad directions to avoid burning GPU hours. The agent tried so many things that failed. But, eventually found the wins: Liger kernel, sequence packing, token-budget dynamic batching, and native FA4 via AttentionInterface. 27% to 47% MFU. 16.7 min to 9.2 min per epoch. If you wanna dig deeper or contribute: github.com/tplr-ai/grail We're optimizing everything at the scale of global nodes to make decentralized post-training as fast as centralized ones. Stay tuned for some cool models coming out of this effort. Cheers!
Erfan Miahi tweet media
English
1
14
58
14.4K
⚡🛡️ Evan Pappas
⚡🛡️ Evan Pappas@Hevalon·
OG miners joining in this TGIF
GIF
templar@tplr_ai

TGIF #29 tomorrow! The Covenant-72B thread reached well beyond Bittensor this week. @DistStateAndMe and the full @covenant_ai team talk about what that traction means, where decentralized AI sits in the broader conversation, and what comes next. Miners, come celebrate with us. We are opening the stage to anyone who contributed compute to the run or has a story to share. Request to speak and we will bring you up. x.com/i/spaces/1oKMv…

English
0
2
13
667
⚡🛡️ Evan Pappas retweetledi
Joel Lidin
Joel Lidin@joellidin·
The hardest part of training a 72B model over the internet with untrusted peers isn't the optimizer or the bandwidth. It's that you can't see what anyone is doing. Every decision about the run, learning rate, intervention timing, participation thresholds, you're making partially blind. That problem never goes away. You just get better at working around it. Having done the hands-on mining work first gave me a better sense of it. Covenant-72B is the largest model ever pre-trained in a fully permissionless setting. And it holds up against centralized 70B models. arxiv.org/abs/2603.08163 | @covenant_ai
templar@tplr_ai

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

English
11
39
352
31.5K
⚡🛡️ Evan Pappas retweetledi
Distributed State
Distributed State@DistStateAndMe·
A small step for mankind, a massive leap for decentralised training... for agency. In the space of 9 months, @tplr_ai went from 1.2B -> 72B. It's never been easy, and has broken everyone on the team multiple times. But I speak for all of us when I say it is the most rewarding thing we have ever done. We have a fraction of the resources. We don't have the PhDs. But Bittensor shows you it doesn't matter. Innovation happens at the edge. We innovate through scarcity. The ones who rewrite the rules are never the ones with the most. They're the ones who refuse to accept the limits they were handed. Bittensor is prophecy. Subnets (@covenant_ai and others) are the tools through which that prophecy is manifested. Next stop: TRILLIONS.
templar@tplr_ai

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

English
18
33
249
18.1K
⚡🛡️ Evan Pappas retweetledi
Distributed State
Distributed State@DistStateAndMe·
There is no higher mission than growing Yuma Rao's garden. Happy to supporting the next generation of subnet owner's through @basilic_ai
covenant@covenant_ai

Seven subnet ideas are heading to testnet, backed by @basilic_ai compute credits. Basilica sponsors the Ideathon because the best way to validate a subnet design is to run it, and compute should not be the bottleneck. Congratulations to all the teams advancing.

English
0
6
40
1.7K
⚡🛡️ Evan Pappas retweetledi
templar
templar@tplr_ai·
We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n
English
208
954
6.2K
1.8M
⚡🛡️ Evan Pappas retweetledi
Distributed State
Distributed State@DistStateAndMe·
Agents can just do things cc ⁦@tplr_ai⁩ , ⁦@basilic_ai⁩ . Told it to improve on heterogenous sparse loco and gave it a basilica api key. Arby is creating SOTA
Distributed State tweet media
English
1
7
38
3.6K
⚡🛡️ Evan Pappas
⚡🛡️ Evan Pappas@Hevalon·
neat architecture template by the owasp; threat modeling, for LLM/agentic apps.
⚡🛡️ Evan Pappas tweet media
English
0
1
3
83