Pluralis Research

184 posts

Pluralis Research banner
Pluralis Research

Pluralis Research

@Pluralis

Pluralis is a research lab focused on collectively-owned AI.

Katılım Temmuz 2024
44 Takip Edilen13.1K Takipçiler
Sabitlenmiş Tweet
Pluralis Research
Pluralis Research@Pluralis·
Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.
Pluralis Research tweet mediaPluralis Research tweet mediaPluralis Research tweet mediaPluralis Research tweet media
English
27
44
296
85K
Pluralis Research retweetledi
Hadi M. Dolatabadi
Hadi M. Dolatabadi@hmdolatabadi·
An over-looked property of Subspace Networks (SSNs) which we've been using in Agora and I'm personally fond of is its transformability into a regular, full-rank model without the compression heads. You can simply fold the low-rank compressors into the projection matrices and so you don’t have to keep special compression heads around at inference time. This is an important property for decentralised training infra: communication-saving architecture changes are much more useful if they don’t lock the final model into a nonstandard final format.
English
2
2
7
885
Pluralis Research
Pluralis Research@Pluralis·
The workshop also includes a poster session featuring the following work: @itsmaddox_j – LoRDO: Distributed Low-Rank Optimization with Infrequent Communication • Xingyu Qu – Can Muon Fine-tune Adam-Pretrained Models? • @benjamintherien – MuLoCo: Muon is a practical inner optimizer for DiLoCo • Jin Lee – SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining • @sungbin_shin – Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis Rotation • Jeffrey T. H. Wong – A3: an Analytical Low-Rank Approximation Framework for Attention pluralis.ai/events/icml-pr…
English
0
0
3
1.6K
Pluralis Research
Pluralis Research@Pluralis·
Factored Gossip DiLoCo (by @ChaminHewa) has been accepted to ICML 2026. It removes the all-reduce required to compute the outer-optimiser step, improving robustness to failed nodes. In a collective training setting, this allows nodes to leave arbritarily with minimal impact.
Pluralis Research tweet mediaPluralis Research tweet media
English
7
8
33
9.2K
Pluralis Research retweetledi
Erfan Miahi
Erfan Miahi@erfan_mhi·
After what Anthropic just did, it's clear that the only way to make sure AI is good for humanity is decentralized AI. So I decided to join @Pluralis as a research scientist to build models no one can own or switch off, building on my work making RL weight sync ~100x more efficient (now in trl, slime, composer 2), as well as other contributions to the field like Covenant72b.
English
25
14
249
20.7K
Pluralis Research retweetledi
Pluralis Research
Pluralis Research@Pluralis·
The 8B model currently training on Agora is 350B tokens in and continuing to converge. The top level metrics and evals look almost exactly like a centralised run. But; - 133 external contributors total bringing 4090's, 5090's, L40S/RTX 6000 and RTX 6000 Pros. These are cards that people actually own - there are no H100, B200's etc. - The max number of nodes the system can support (104) was filled almost immediately. The authorization layer is receiving approximately 100 requests/minute to join. - The total tokens/per second processed moves directly with amount of compute in the swarm, with Agora constantly optimising to make most efficient use of what hardware is present. - MFU is approximately 20%, TPS is 170k tok/s. There are near constant communication failures which Agora is completely absorbing without slowdown. - The system is effectively on auto-pilot, requiring very little intervention from us. Bad nodes are purged immediately before training is affected and new nodes take their place.
Pluralis Research tweet media
English
4
16
146
61.4K
Pluralis Research retweetledi
Alexander Long
Alexander Long@AlexanderLong·
I would like to make a few brief points; - Opensource ai is not the same thing as opensource software. The models cost tens to hundreds of millions to make. This is not gonna be a volunteer effort from people doing stuff after work for free. - the second you release a weight set, you lose any ability to make money serving your own model and recoup the training cost. This very simple property means open-weights is unsustainable. - the things you ACTUALLY want from opensource ai is: transparent behaviour, dispersed ownership and control, a guarantee of access, the ability to build on it/modify it, and privacy. protocol learning gets you all 4 and is the only alternative to closed models that makes any kind of sense. By protocol learning I mean a very specific, novel thing; collaborative training and development of the models without anyone ever being able to see the complete weight set.
English
21
22
134
13.6K
Pluralis Research retweetledi
Alexander Long
Alexander Long@AlexanderLong·
The single most immediate, impactful downside to AI is concentration of power risk. This is 1/100th of how bad it's going to get. The only way out of this is to have an independent model supply chain via pooled compute.
elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

English
3
10
86
6.8K
Pluralis Research retweetledi
Alexander Long
Alexander Long@AlexanderLong·
Whole section on Pluralis in Chamath's substack this week right under details of Anthropic's monster round.
Alexander Long tweet media
English
9
13
93
17.5K
Pluralis Research retweetledi
crux
crux@macrocrux·
Something very important is being brought into existence right now. Bricks have been laid over the last 18 months and now the tech is coming together in a way that makes commercialization possible. If this shit works, it will completely disrupt the economics of training large models and the floodgates will burst open. @Pluralis and @MacrocosmosAI are the only teams who I think can clearly see the shape of this opportunity right now. Agora is a strong first step towards this future. After spending a bit of time on their platform there's a form factor to it which feels "natural", almost inevitable in hindsight. This subfield of training is really starting to take shape. Our IOTA team has been very, very busy for the last few months. Can't wait to share more soon.
Pluralis Research@Pluralis

Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

English
6
14
64
10.5K
Pluralis Research
Pluralis Research@Pluralis·
There are still many limitations: - Agora is not yet self-serve - We currently cannot use nodes outside North America - We’re restricted to our current model architecture - We still need to operate approximately 30% of the swarm ourselves However, the fact that this works at all is significant, and these restrictions are engineering constraints rather than fundamental limitations.
English
2
0
15
4.2K
Pluralis Research
Pluralis Research@Pluralis·
Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.
Pluralis Research tweet mediaPluralis Research tweet mediaPluralis Research tweet mediaPluralis Research tweet media
English
27
44
296
85K