Distributed State

9.5K posts

Distributed State banner
Distributed State

Distributed State

@DistStateAndMe

Founder @covenant_ai ( templar , basilica , grail )

subnet (3/39/81) Inscrit le Nisan 2014
2.6K Abonnements4K Abonnés
Tweet épinglé
Distributed State
Distributed State@DistStateAndMe·
A small step for mankind, a massive leap for decentralised training... for agency. In the space of 9 months, @tplr_ai went from 1.2B -> 72B. It's never been easy, and has broken everyone on the team multiple times. But I speak for all of us when I say it is the most rewarding thing we have ever done. We have a fraction of the resources. We don't have the PhDs. But Bittensor shows you it doesn't matter. Innovation happens at the edge. We innovate through scarcity. The ones who rewrite the rules are never the ones with the most. They're the ones who refuse to accept the limits they were handed. Bittensor is prophecy. Subnets (@covenant_ai and others) are the tools through which that prophecy is manifested. Next stop: TRILLIONS.
templar@tplr_ai

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

English
18
33
249
18.1K
Distributed State retweeté
Carl Jung Archive
Carl Jung Archive@QuoteJung·
Carl Jung was not playing around when he wrote: “No matter how isolated you are and how lonely you feel, if you do your work truly and conscientiously, unknown allies will come and seek you.”
English
40
1.5K
12K
180K
Distributed State retweeté
Chamath Palihapitiya
Jensen Pod!!!!!!
The All-In Podcast@theallinpod

🚨MAJOR INTERVIEW: Jensen Huang joins the Besties! The @nvidia CEO joins to discuss: -- Nvidia's future, roadmap to $1T revenue -- Physical AI's $50T market -- Rise of the agent, OpenClaw's inflection moment -- Inference explosion, Groq deal -- AI PR Crisis, Anthropic's comms mistakes -- Token allocation for employees ++ much more! (0:00) Jensen Huang joins the show! (0:26) Acquiring Groq and the inference explosion (8:53) Decision making at the world's most valuable company (10:47) Physical AI's $50T market, OpenClaw's future, the new operating system for modern AI computing (16:38) AI's PR crisis, refuting doomer narratives, Anthropic's comms mistakes (20:48) Revenue capacity, token allocation for employees, Karpathy's autoresearch, agentic future (30:50) Open source, global diffusion, Iran/Taiwan supply chain impact (39:45) Self-driving platform, facing competition from active customers, responding to growth slowdown predictions (47:32) Datacenters in space, AI healthcare, Robotics (56:10) OpenAI/Anthropic revenue potential, how to build an AI moat (59:04) Advice to young people on excelling in the AI era

Dansk
60
59
813
102.1K
Distributed State retweeté
Openτensor Foundaτion
Openτensor Foundaτion@opentensor·
The largest decentralised LLM pre-training run in history. SN3 @tplr_ai trained Covenant-72B across 70+ contributors on open internet infrastructure. Now it’s being discussed by @chamath with @nvidia CEO Jensen Huang. Distributed, open-weight model training on Bittensor is getting started.
English
55
311
1.4K
67.7K
Algod
Algod@AlgodTrading·
Slowly, then all at once
templar@tplr_ai

On the @theallinpod this week, @chamath asked @nvidia CEO Jensen Huang about decentralized AI training, calling our Covenant-72B run "a pretty crazy technical accomplishment." One correction: it's 72 billion parameters, not four. Trained permissionlessly across 70+ contributors on commodity internet. The largest model ever pre-trained on fully decentralized infrastructure. Jensen's answer is worth hearing too.

English
17
36
387
25.8K
Swamination
Swamination@Swamination·
Keep cooking.
templar@tplr_ai

On the @theallinpod this week, @chamath asked @nvidia CEO Jensen Huang about decentralized AI training, calling our Covenant-72B run "a pretty crazy technical accomplishment." One correction: it's 72 billion parameters, not four. Trained permissionlessly across 70+ contributors on commodity internet. The largest model ever pre-trained on fully decentralized infrastructure. Jensen's answer is worth hearing too.

English
2
2
10
316
Distributed State retweeté
Lisa
Lisa@chieftplr_ai·
31:44 - @DistStateAndMe @covenant_ai @tplr_ai * 72 billion parameter model with decentralized training, not a 4 billion parameter model
The All-In Podcast@theallinpod

🚨MAJOR INTERVIEW: Jensen Huang joins the Besties! The @nvidia CEO joins to discuss: -- Nvidia's future, roadmap to $1T revenue -- Physical AI's $50T market -- Rise of the agent, OpenClaw's inflection moment -- Inference explosion, Groq deal -- AI PR Crisis, Anthropic's comms mistakes -- Token allocation for employees ++ much more! (0:00) Jensen Huang joins the show! (0:26) Acquiring Groq and the inference explosion (8:53) Decision making at the world's most valuable company (10:47) Physical AI's $50T market, OpenClaw's future, the new operating system for modern AI computing (16:38) AI's PR crisis, refuting doomer narratives, Anthropic's comms mistakes (20:48) Revenue capacity, token allocation for employees, Karpathy's autoresearch, agentic future (30:50) Open source, global diffusion, Iran/Taiwan supply chain impact (39:45) Self-driving platform, facing competition from active customers, responding to growth slowdown predictions (47:32) Datacenters in space, AI healthcare, Robotics (56:10) OpenAI/Anthropic revenue potential, how to build an AI moat (59:04) Advice to young people on excelling in the AI era

English
1
2
9
706
Distributed State retweeté
Mark Jeffrey
Mark Jeffrey@markjeffrey·
Bittensor peeps: check out 31:44 - Templar sn3 discussed. @chamath -- they've achieved a *72* billion parameter model with decentralized training, not a 4 billion parameter model :)
English
17
80
324
49.9K
Distributed State retweeté
grail
grail@grail_ai·
PULSE made weight sync 100x faster. That turned the trainer itself into the bottleneck. @erfan_mhi just fixed that too. Grail's GRPO trainer is now 1.8x faster on a single B200: 27% to 47% MFU, epoch time nearly halved. Decentralized post-training is converging on centralized speed.
Erfan Miahi@erfan_mhi

Used autoresearch to make @grail_ai GRPO trainer 1.8x faster on a single B200. I kept postponing this for weeks since the bottleneck in our decentralized framework was mainly communication. But after our proposed technique, PULSE, made weight sync 100x faster, the training update itself became the bottleneck. Even with a fully async trainer and inference, a slow trainer kills convergence speed. A task that could've eaten days of my time ran in parallel while I worked on other stuff. Unlike original autoresearch, where each experiment is 5 min, our feedback loop is way longer (10-17 min per epoch + 10-60 minutes of installations and code changes), so I did minimal steering when it was heading in bad directions to avoid burning GPU hours. The agent tried so many things that failed. But, eventually found the wins: Liger kernel, sequence packing, token-budget dynamic batching, and native FA4 via AttentionInterface. 27% to 47% MFU. 16.7 min to 9.2 min per epoch. If you wanna dig deeper or contribute: github.com/tplr-ai/grail We're optimizing everything at the scale of global nodes to make decentralized post-training as fast as centralized ones. Stay tuned for some cool models coming out of this effort. Cheers!

English
0
10
42
8.1K
Distributed State
Distributed State@DistStateAndMe·
When you fix one bottleneck, the next one becomes visible. At @covenant_ai we built PULSE (arxiv.org/abs/2602.03839) to make weight sync 100× faster. That worked. Then the trainer itself became the new ceiling. So @erfan_mhi ran autoresearch on our GRPO trainer. 27% → 47% MFU. 16.7 min → 9.2 min per epoch. 1.8× faster on a single B200. Decentralized post-training, closing the gap with centralized. github.com/tplr-ai/grail
Erfan Miahi@erfan_mhi

Used autoresearch to make @grail_ai GRPO trainer 1.8x faster on a single B200. I kept postponing this for weeks since the bottleneck in our decentralized framework was mainly communication. But after our proposed technique, PULSE, made weight sync 100x faster, the training update itself became the bottleneck. Even with a fully async trainer and inference, a slow trainer kills convergence speed. A task that could've eaten days of my time ran in parallel while I worked on other stuff. Unlike original autoresearch, where each experiment is 5 min, our feedback loop is way longer (10-17 min per epoch + 10-60 minutes of installations and code changes), so I did minimal steering when it was heading in bad directions to avoid burning GPU hours. The agent tried so many things that failed. But, eventually found the wins: Liger kernel, sequence packing, token-budget dynamic batching, and native FA4 via AttentionInterface. 27% to 47% MFU. 16.7 min to 9.2 min per epoch. If you wanna dig deeper or contribute: github.com/tplr-ai/grail We're optimizing everything at the scale of global nodes to make decentralized post-training as fast as centralized ones. Stay tuned for some cool models coming out of this effort. Cheers!

English
4
16
103
6.8K
Distributed State
Distributed State@DistStateAndMe·
@zacodil why do you hate Bittensor its pretty confusing. I dont read this and get the sudden urge to fud near. It should never be PVP. The mission is greater than petty squabbles. We are not the enemy
English
0
0
0
15
Vadim
Vadim@zacodil·
Stop scrolling - this changes how AI makes money. Illia Polosukhin is speaking today at NVIDIA GTC - and this one actually matters. He’s not retelling Transformer history. He’s laying out something bigger: a blueprint for how AI agents trade, settle, and resolve disputes with each other. Programmatic escrow. Intent-based matching. Agent-run arbitration. The core idea: today’s markets are built for humans -our biases, delays, and legal friction. But when AI agents become the main economic actors? Everything breaks. You don’t tweak the system. You rebuild it from scratch. That’s what NEAR Protocol is already moving toward: – Intents layer – AI Agent Market – Private transactions for agents This talk is the theory behind it all. Transformer co-author. Agent economies. On Jensen Huang’s stage. The infrastructure for an agent economy is starting to take shape.
Vadim tweet media
English
5
1
38
1.3K
Distributed State retweeté
Grigory Sapunov
Grigory Sapunov@che_shr_cat·
1/ The standard x + f(x) residual connection is the bedrock of modern architectures. It is also a massive bottleneck. Unweighted accumulation causes state magnitudes to grow linearly, diluting early layers and capping efficient depth scaling. 🧵
Grigory Sapunov tweet media
English
1
4
69
4.9K
Leadpoet
Leadpoet@LeadpoetAI·
Introducing Leadpoet. The AI agent that delivers ready-to-buy prospects on demand. Your next customer is already looking for your solution. Leadpoet finds them. Comment “Poet” and we’ll send you 100 free lead credits for your ICP.
English
687
104
721
675.3K
Jasmine
Jasmine@jasminervaa·
@infinitetensor @DistStateAndMe Will publish an English version tomorrow 🤝 Originally thought English articles already too many, and not many Chinese familiar with $TAO
English
1
0
5
178
Distributed State retweeté
Mars
Mars@infinitetensor·
The evolution of decentralized training: 2022 — Together GPT-JT (6B): proving multi-machine collab is possible 2023 — SWARM Intelligence (~1B): proposed a heterogeneous-node collaborative training framework 2024 — INTELLECT-1 (10B): decentralized training across whitelisted peers 2026 — @covenant_ai-72B / SN3 @tplr_ai : the first 72B model trained decentrally to outperform centralized training on mainstream benchmarks This article is worth translating to english. When Bittensor was created, no-one knew decentralized training was possible, the models of the day were full of hallucinations and no-one felt the threat of job loss. @DistStateAndMe well done
0xai@0xai_dev

x.com/i/article/2033…

English
1
3
25
3.2K
George
George@georgecurtiss·
you’ve got to be fucking retarded to build your own database
English
85
5
758
65.2K