Alexander Long

473 posts

Alexander Long

@AlexanderLong

Founder @Pluralis | ML PhD

Katılım Temmuz 2023

1.3K Takip Edilen3.2K Takipçiler

Sabitlenmiş Tweet

Alexander Long@AlexanderLong·5 Ara

Since I started getting interested in ML I got it in my head that all I wanted to do was one smart thing that I could look back on and be satisfied that I did. Most papers are kinda bad even if they get accepted - the idea is very incremental, or it's just not that good an idea, or it doesn't really matter. I never was able to do this all through PhD or my time at Amazon. All the papers I did there got into various places, but I never really thought they were actually that good. And I'd pretty much given up on this because Pluralis meant I couldn't really devote enough time to research myself. But in February I decided I didn't care and spend two months focused on a specific problem that had been going round in my head for about a year that I felt we needed to solve, and the solution came to me, and @ChaminHewa picked it up and generalised the approach and ran a bunch of novel experiments I hadn't thought of, and pulled everything together into an actual paper. And yesterday we presented this work at NeurIPS. This is the first and probably only work I will ever do that for me feels like "ok that was GOOD". I don't care if it racks up a bunch of citations and disperses into the field or not, I don't care if someone repackages the ideas and takes all the credit for it, I don't care. For me there is an internal checkbox that just got ticked after more than ten years of trying. Anyone in ML will understand what I'm trying to say. Special day I'm going to remember for a long time.

English

200

17K

Alexander Long@AlexanderLong·5d

@saucypajamas very shortly yes

English

295

SaucyPajamas@saucypajamas·5d

@AlexanderLong Hey is pluralis mineable ?

English

260

Alexander Long@AlexanderLong·5d

My estimates are that Pluralis or Pluralis-like protocols can bring online about 4 GW of compute that is completely unusable for ML workloads today as it is not co-located.

David Sacks@DavidSacks

Back-of-envelope numbers for 1 gigawatt data center: All-in Capex: ~$50 bn Enterprise revenue generated: ~$25-30 bn/year Electricity cost: $1-2 bn/year ~2 year payback. The boom is real.

English

10.5K

Alexander Long@AlexanderLong·5d

@airtightfish @kelxyz_ Other thing is it brings that compute online without the 50b of capex.

English

Squid@airtightfish·5d

@kelxyz_ “A 1 GW AI data center can plausibly generate around $10B–$25B/year in compute revenue, depending on GPU type, utilization, and pricing.” So what’s 4 worth?

English

287

kel.@kelxyz_·5d

4GW worth of internet based compute swarm Minimum

Alexander Long@AlexanderLong

My estimates are that Pluralis or Pluralis-like protocols can bring online about 4 GW of compute that is completely unusable for ML workloads today as it is not co-located.

English

5.3K

Alexander Long retweetledi

Joey (e/λ)@shxf0072·3 May

this example is from rl paper, one of the best one curiosity driven exploration, if you squint eyes hard enough you can find jepa like arch trained end to end with rl

davinci@leothecurious

one nice quote from the welch labs video to make this more intuitive: "if u train a generative model, u know, to predict what's gonna happen in a dashcam video, it will spend most of its resources predicting the random motion of the leaves on the trees that are bordering the road." ~ @ylecun

English

223

20.6K

Alexander Long retweetledi

Pluralis Research@Pluralis·1 May

Factored Gossip DiLoCo (by @ChaminHewa) has been accepted to ICML 2026. It removes the all-reduce required to compute the outer-optimiser step, improving robustness to failed nodes. In a collective training setting, this allows nodes to leave arbritarily with minimal impact.

English

2.4K

Alexander Long@AlexanderLong·1 May

@Ar_Douillard DiPaCo is so underrated - you were 2 years ahead of anyone with that.

English

116

Arthur Douillard@Ar_Douillard·1 May

My paper I am the proudest is DiPaCo, but unfortunately it never got much traction 🤷‍♂️ x.com/ar_douillard/s…

David Pfau@pfau

The papers I'm proudest of in my career are on things that never took off, missed the mark on "the big thing" by as much as you possibly could, and got a few dozen citations. They've had ~zero impact. But every time someone tells me they loved those papers it makes it worth it.

English

1.6K

Alexander Long@AlexanderLong·27 Nis

You’re turning those FLOPs into something that in aggregate is more valuable than the cost of the flops (a good model). Incentivization is very straightforward in this case; your earning claims on future cash flows into the model. But in this scenario it’s absolutely critical that the weights cannot be leaked out side of the protocol or you have no ability to caputure paid use or charge any kind of margin (this is an existential problem for all open weight companies). This is why we call it protocol learning and not decentralized training - it has an additional, absolutely critical property. Collective training with unextractibility of the weights. Pretty much everything we work/are working on is to achieve this property.

English

defiprime@defiprime·27 Nis

@Pluralis @jbrukh @oguzer90 @niclane7 @m_ryabinin @namhoonlee09 @itsmaddox_j decentralized training sounds nice until incentive accounting hits. who gets paid for the marginal useful gradient

English

151

Alexander Long retweetledi

Pluralis Research@Pluralis·27 Nis

The first Protocol Learning workshop in Rio today! A new field is emerging: decentralized training of foundation models. Because the world’s intelligence belongs in open, collectively owned systems. Thanks for joining @oguzer90 @niclane7 @m_ryabinin @namhoonlee09 @itsmaddox_j

English

3.7K

Alexander Long retweetledi

Chris Davis@capradavis·23 Nis

Citations include SparseLoCo, NoLoCo (@gensynai) SPARTA (@exolabs), and an early paper by @Pluralis . This is the kind of institutional adoption that actually matters.

Google DeepMind@GoogleDeepMind

This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. 🧵

English

3.6K

Alexander Long retweetledi

Tolga Birdal@tolga_birdal·22 Nis

Modern deep networks are often trained at the #EdgeOfStability, a regime where dynamics are locally unstable, nearing chaos. Yet generalization improves, defying the wisdom of classical optimization. We now theoretically explain this central puzzle: arxiv.org/abs/2604.19740. 👇

English

157

965

100.4K

Alexander Long retweetledi

Tongtian Zhu@Tongtian_Zhu·15 Nis

Heading to Rio for ICLR? 🇧🇷🌴 Excited to share our ICLR 2026 Oral paper 🎉 Communication is becoming a real bottleneck in large scale training. As clusters scale up, a natural question is: If communication is limited/expensive, how should we allocate the communication "budget" over training ? Paper: openreview.net/forum?id=zrFnw…

English

6.2K

Alexander Long retweetledi

Pluralis Research@Pluralis·14 Nis

Are you attending ICLR 2026? We’re hosting a workshop focused on open problems in decentralized training and collaborative foundation models, joined by top researchers: @niclane7 @m_ryabinin @namhoonlee09 @oguzer90 RSVP: pluralis.ai/events/ space is limited!

English

11.6K

Alexander Long@AlexanderLong·13 Nis

Activation compression is fundamentally different to gradient compression as it alters training dynamics, but it must be solved to allow a model to be split up over participants. If you cannot split a model over participants, I don't see how you keep the weight set private, and if you can't keep the weight set private, I don't see how you make collective training sustainable. A genuinely novel sub-field is emerging here - very rare thing to be able to observe in realtime.

Macrocosmos@MacrocosmosAI

Training frontier models over the internet requires new techniques. Today, we present ResBM, a residual encoder-decoder bottleneck architecture that enables 128x activation compression for low-bandwidth distributed pipeline parallel training. Developed for @IOTA_SN9, we show SOTA compression without significant loss in convergence rates, increases in memory, or compute overhead. Expect the full paper release in the next 72 hours.

English

3.4K

Alexander Long retweetledi

Nathan Lambert@natolambert·12 Nis

What it looks like when open model companies start to worry about money. It was bound to happen. Making frontier level, open models with free use licenses is unsustainable for everyone except maybe nvidia.

Florian Brand@xeophon

wow, they did a non-commercial license... M2: Display the name if >30M revenue / 100M users M2.1: Display the name M2.5: Acceptable use policy M2.7: Non-Commercial license

English

216

33.6K

Alexander Long retweetledi

kel.@kelxyz_·10 Nis

@chamath have your team read this. just checked and looks like nobody in the comments has shared. but yea have felt for a while that this team is taking the most thorough approach, and most people aren't privy to their research #training-environment" target="_blank" rel="nofollow noopener">pluralis.ai/blog/pluralis-…

English

1.9K

Alexander Long@AlexanderLong·10 Nis

homogeneity vs heterogeneity: AsyncMesh arxiv.org/abs/2601.22442 training clusters, orchestration: pluralis.ai/blog/pluralis-… financial incentives: UPMs neurips.cc/virtual/2025/l… also have to solve extreme compression to split up models over participants (SSN's): arxiv.org/pdf/2506.01260 and neurips.cc/virtual/2025/l… and also verification

Chamath Palihapitiya@chamath

If Martin is right, he also just wrote the product spec for open source + distributed compute where broad swaths of groups, individuals and organizations contribute their compute resources to training runs for large param open source models. There are lots of issues in figuring this out: homogeneity vs heterogeneity of the training clusters, orchestration, financial incentives etc etc etc but some early projects are good signal as to where this can go and that these limitations can be overcome (folding @home, Venice, Tao). An attempted oligopoly on intelligence is the perfect boundary condition for a bottoms up uprising of fully open, fully distributed AI.

English

2.1K

Alexander Long retweetledi

Chamath Palihapitiya@chamath·10 Nis

If Martin is right, he also just wrote the product spec for open source + distributed compute where broad swaths of groups, individuals and organizations contribute their compute resources to training runs for large param open source models. There are lots of issues in figuring this out: homogeneity vs heterogeneity of the training clusters, orchestration, financial incentives etc etc etc but some early projects are good signal as to where this can go and that these limitations can be overcome (folding@home, Venice, Tao). An attempted oligopoly on intelligence is the perfect boundary condition for a bottoms up uprising of fully open, fully distributed AI.

martin_casado@martin_casado

It's only a matter of time before only the model creators have access to the most powerful models. The rest get access to smaller, distilled versions. Or access the models through first party apps and services that don't provide direct access to the token path. The investment needs for training are too high, and distillation too effective to warrant any other future.

English

850

297.6K

Alexander Long@AlexanderLong·7 Nis

We do not plan to make Mythos Preview generally available.

Anthropic@AnthropicAI

We do not plan to make Mythos Preview generally available. Our goal is to deploy Mythos-class models safely at scale, but first we need safeguards that reliably block their most dangerous outputs. We’ll begin testing those safeguards with an upcoming Claude Opus model.

English

563

Alexander Long retweetledi

clem 🤗@ClementDelangue·4 Nis

I think it’s @NaveenGRao who said it before but wouldn’t be surprised if the frontier labs cut their APIs entirely at some point. In a compute constrained world, they’ll always prioritize their own direct products/customers. Makes it scary and unsustainable to only build on top of their APIs!

English

498

77.8K

Alexander Long retweetledi

Daniel Barabander@dbarabander·30 Mar

x.com/i/article/2038…

ZXX

24.2K

Keşfet

@saucypajamas @airtightfish @kelxyz_ @ChaminHewa @Ar_Douillard @Pluralis @jbrukh @oguzer90