Pluralis Research (@Pluralis) - Twitter Profili

Sabitlenmiş Tweet

Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

English

27

44

296

85K

Pluralis Research retweetledi

Hadi M. Dolatabadi@hmdolatabadi·3d

An over-looked property of Subspace Networks (SSNs) which we've been using in Agora and I'm personally fond of is its transformability into a regular, full-rank model without the compression heads. You can simply fold the low-rank compressors into the projection matrices and so you don’t have to keep special compression heads around at inference time. This is an important property for decentralised training infra: communication-saving architecture changes are much more useful if they don’t lock the final model into a nonstandard final format.

English

2

7

885

Pluralis Research@Pluralis·2d

The workshop also includes a poster session featuring the following work: @itsmaddox_j – LoRDO: Distributed Low-Rank Optimization with Infrequent Communication • Xingyu Qu – Can Muon Fine-tune Adam-Pretrained Models? • @benjamintherien – MuLoCo: Muon is a practical inner optimizer for DiLoCo • Jin Lee – SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining • @sungbin_shin – Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis Rotation • Jeffrey T. H. Wong – A3: an Analytical Low-Rank Approximation Framework for Attention pluralis.ai/events/icml-pr…

English

0

3

1.6K

Pluralis Research@Pluralis·2d

We are hosting a workshop during ICML on Protocol Learning to examine the key open challenges in collaborative training across distributed networks. Talks by @Ana_koloskova @sam_hrvth @hmdolatabadi & more speakers to be announced Register here: pluralis.ai/events/icml-pr…

English

3

4

39

4.7K

Pluralis Research@Pluralis·3d

@dmihal @ChaminHewa arxiv.org/pdf/2606.22768

QME

0

1

62

David Mihal.eth@dmihal·17 Haz

@Pluralis @ChaminHewa Where's the link to the paper?

English

1

0

57

Pluralis Research@Pluralis·1 May

Factored Gossip DiLoCo (by @ChaminHewa) has been accepted to ICML 2026. It removes the all-reduce required to compute the outer-optimiser step, improving robustness to failed nodes. In a collective training setting, this allows nodes to leave arbritarily with minimal impact.

English

7

8

33

9.2K

Pluralis Research retweetledi

Erfan Miahi@erfan_mhi·15 Haz

After what Anthropic just did, it's clear that the only way to make sure AI is good for humanity is decentralized AI. So I decided to join @Pluralis as a research scientist to build models no one can own or switch off, building on my work making RL weight sync ~100x more efficient (now in trl, slime, composer 2), as well as other contributions to the field like Covenant72b.

English

25

14

249

20.7K

Pluralis Research retweetledi

Alexander Long@AlexanderLong·13 Haz

Systems like this must exist. This is the way out.

Pluralis Research@Pluralis

The 8B model currently training on Agora is 350B tokens in and continuing to converge. The top level metrics and evals look almost exactly like a centralised run. But; - 133 external contributors total bringing 4090's, 5090's, L40S/RTX 6000 and RTX 6000 Pros. These are cards that people actually own - there are no H100, B200's etc. - The max number of nodes the system can support (104) was filled almost immediately. The authorization layer is receiving approximately 100 requests/minute to join. - The total tokens/per second processed moves directly with amount of compute in the swarm, with Agora constantly optimising to make most efficient use of what hardware is present. - MFU is approximately 20%, TPS is 170k tok/s. There are near constant communication failures which Agora is completely absorbing without slowdown. - The system is effectively on auto-pilot, requiring very little intervention from us. Bad nodes are purged immediately before training is affected and new nodes take their place.

English

5

7

61

12.2K

Pluralis Research@Pluralis·13 Haz

The 8B model currently training on Agora is 350B tokens in and continuing to converge. The top level metrics and evals look almost exactly like a centralised run. But; - 133 external contributors total bringing 4090's, 5090's, L40S/RTX 6000 and RTX 6000 Pros. These are cards that people actually own - there are no H100, B200's etc. - The max number of nodes the system can support (104) was filled almost immediately. The authorization layer is receiving approximately 100 requests/minute to join. - The total tokens/per second processed moves directly with amount of compute in the swarm, with Agora constantly optimising to make most efficient use of what hardware is present. - MFU is approximately 20%, TPS is 170k tok/s. There are near constant communication failures which Agora is completely absorbing without slowdown. - The system is effectively on auto-pilot, requiring very little intervention from us. Bad nodes are purged immediately before training is affected and new nodes take their place.

English

4

16

146

61.4K

Pluralis Research retweetledi

Alexander Long@AlexanderLong·13 Haz

I would like to make a few brief points; - Opensource ai is not the same thing as opensource software. The models cost tens to hundreds of millions to make. This is not gonna be a volunteer effort from people doing stuff after work for free. - the second you release a weight set, you lose any ability to make money serving your own model and recoup the training cost. This very simple property means open-weights is unsustainable. - the things you ACTUALLY want from opensource ai is: transparent behaviour, dispersed ownership and control, a guarantee of access, the ability to build on it/modify it, and privacy. protocol learning gets you all 4 and is the only alternative to closed models that makes any kind of sense. By protocol learning I mean a very specific, novel thing; collaborative training and development of the models without anyone ever being able to see the complete weight set.

English

21

22

134

13.6K

Pluralis Research@Pluralis·13 Haz

Watershed moment.

Anthropic@AnthropicAI

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

English

2

1

32

5.1K

Pluralis Research retweetledi

Alexander Long@AlexanderLong·9 Haz

The single most immediate, impactful downside to AI is concentration of power risk. This is 1/100th of how bad it's going to get. The only way out of this is to have an independent model supply chain via pooled compute.

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

English

3

10

86

6.8K

Pluralis Research@Pluralis·8 Haz

ZXX

6

10

66

9K

Pluralis Research@Pluralis·8 Haz

@OnWavs @kelxyz_ @tha_ajanthan This is great -- love to see community members building these pieces of infra. Let us know if you need anything to roll it out!

English

2

0

1

53

walkonwayvs@OnWavs·6 Haz

@kelxyz_ @tha_ajanthan @Pluralis I made a dashboard to track my own nodes on Pluralis. Hopefully going to make one public for others to use also soon.

English

2

0

3

77

Thalaiyasingam Ajanthan@tha_ajanthan·25 May

Imagine being able to collectively train (and own) an LLM on all of these GPUs. This is exactly what we aim to do @Pluralis. See the current live run at agora.pluralis.ai

clem 🤗@ClementDelangue

300,000 AI builders filled their hardware profile on @huggingface and we're sharing the results: hf.co/hardware. Excited to see how it evolves in the coming months especially with the explosion of local AI!

English

1

2

14

2.2K

Pluralis Research retweetledi

Alexander Long@AlexanderLong·31 May

Whole section on Pluralis in Chamath's substack this week right under details of Anthropic's monster round.

English

9

13

93

17.5K

Pluralis Research retweetledi

crux@macrocrux·22 May

Something very important is being brought into existence right now. Bricks have been laid over the last 18 months and now the tech is coming together in a way that makes commercialization possible. If this shit works, it will completely disrupt the economics of training large models and the floodgates will burst open. @Pluralis and @MacrocosmosAI are the only teams who I think can clearly see the shape of this opportunity right now. Agora is a strong first step towards this future. After spending a bit of time on their platform there's a form factor to it which feels "natural", almost inevitable in hindsight. This subfield of training is really starting to take shape. Our IOTA team has been very, very busy for the last few months. Can't wait to share more soon.

Pluralis Research@Pluralis

Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

English

6

14

64

10.5K

Pluralis Research retweetledi

kel.@kelxyz_·22 May

x.com/i/article/2051…

ZXX

56

73

669

297.4K

Pluralis Research@Pluralis·20 May

More details, code, and join instructions: pluralis.ai/docs A live view of the run that just started: agora.pluralis.ai

English

3

0

15

3.8K

Pluralis Research@Pluralis·20 May

There are still many limitations: - Agora is not yet self-serve - We currently cannot use nodes outside North America - We’re restricted to our current model architecture - We still need to operate approximately 30% of the swarm ourselves However, the fact that this works at all is significant, and these restrictions are engineering constraints rather than fundamental limitations.

English

2

0

15

4.2K

Pluralis Research@Pluralis·20 May

Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

English

27

44

296

85K

Pluralis Research

Keşfet