Alexander Long

473 posts

Alexander Long banner
Alexander Long

Alexander Long

@AlexanderLong

Founder @Pluralis | ML PhD

Katılım Temmuz 2023
1.3K Takip Edilen3.2K Takipçiler
Sabitlenmiş Tweet
Alexander Long
Alexander Long@AlexanderLong·
Since I started getting interested in ML I got it in my head that all I wanted to do was one smart thing that I could look back on and be satisfied that I did. Most papers are kinda bad even if they get accepted - the idea is very incremental, or it's just not that good an idea, or it doesn't really matter. I never was able to do this all through PhD or my time at Amazon. All the papers I did there got into various places, but I never really thought they were actually that good. And I'd pretty much given up on this because Pluralis meant I couldn't really devote enough time to research myself. But in February I decided I didn't care and spend two months focused on a specific problem that had been going round in my head for about a year that I felt we needed to solve, and the solution came to me, and @ChaminHewa picked it up and generalised the approach and ran a bunch of novel experiments I hadn't thought of, and pulled everything together into an actual paper. And yesterday we presented this work at NeurIPS. This is the first and probably only work I will ever do that for me feels like "ok that was GOOD". I don't care if it racks up a bunch of citations and disperses into the field or not, I don't care if someone repackages the ideas and takes all the credit for it, I don't care. For me there is an internal checkbox that just got ticked after more than ten years of trying. Anyone in ML will understand what I'm trying to say. Special day I'm going to remember for a long time.
Alexander Long tweet media
English
21
12
200
17K
Squid
Squid@airtightfish·
@kelxyz_ “A 1 GW AI data center can plausibly generate around $10B–$25B/year in compute revenue, depending on GPU type, utilization, and pricing.” So what’s 4 worth?
English
2
0
2
287
Alexander Long retweetledi
Alexander Long retweetledi
Pluralis Research
Pluralis Research@Pluralis·
Factored Gossip DiLoCo (by @ChaminHewa) has been accepted to ICML 2026. It removes the all-reduce required to compute the outer-optimiser step, improving robustness to failed nodes. In a collective training setting, this allows nodes to leave arbritarily with minimal impact.
Pluralis Research tweet mediaPluralis Research tweet media
English
4
6
29
2.4K
Alexander Long
Alexander Long@AlexanderLong·
You’re turning those FLOPs into something that in aggregate is more valuable than the cost of the flops (a good model). Incentivization is very straightforward in this case; your earning claims on future cash flows into the model. But in this scenario it’s absolutely critical that the weights cannot be leaked out side of the protocol or you have no ability to caputure paid use or charge any kind of margin (this is an existential problem for all open weight companies). This is why we call it protocol learning and not decentralized training - it has an additional, absolutely critical property. Collective training with unextractibility of the weights. Pretty much everything we work/are working on is to achieve this property.
English
1
0
2
31
Alexander Long retweetledi
Pluralis Research
Pluralis Research@Pluralis·
The first Protocol Learning workshop in Rio today! A new field is emerging: decentralized training of foundation models. Because the world’s intelligence belongs in open, collectively owned systems. Thanks for joining @oguzer90 @niclane7 @m_ryabinin @namhoonlee09 @itsmaddox_j
Pluralis Research tweet mediaPluralis Research tweet mediaPluralis Research tweet mediaPluralis Research tweet media
English
3
14
47
3.7K
Alexander Long retweetledi
Tolga Birdal
Tolga Birdal@tolga_birdal·
Modern deep networks are often trained at the #EdgeOfStability, a regime where dynamics are locally unstable, nearing chaos. Yet generalization improves, defying the wisdom of classical optimization. We now theoretically explain this central puzzle: arxiv.org/abs/2604.19740. 👇
Tolga Birdal tweet media
English
22
157
965
100.4K
Alexander Long retweetledi
Tongtian Zhu
Tongtian Zhu@Tongtian_Zhu·
Heading to Rio for ICLR? 🇧🇷🌴 Excited to share our ICLR 2026 Oral paper 🎉 Communication is becoming a real bottleneck in large scale training. As clusters scale up, a natural question is: If communication is limited/expensive, how should we allocate the communication "budget" over training ? Paper: openreview.net/forum?id=zrFnw…
English
1
6
36
6.2K
Alexander Long
Alexander Long@AlexanderLong·
Activation compression is fundamentally different to gradient compression as it alters training dynamics, but it must be solved to allow a model to be split up over participants. If you cannot split a model over participants, I don't see how you keep the weight set private, and if you can't keep the weight set private, I don't see how you make collective training sustainable. A genuinely novel sub-field is emerging here - very rare thing to be able to observe in realtime.
Macrocosmos@MacrocosmosAI

Training frontier models over the internet requires new techniques. Today, we present ResBM, a residual encoder-decoder bottleneck architecture that enables 128x activation compression for low-bandwidth distributed pipeline parallel training. Developed for @IOTA_SN9, we show SOTA compression without significant loss in convergence rates, increases in memory, or compute overhead. Expect the full paper release in the next 72 hours.

English
5
10
35
3.4K
Alexander Long retweetledi
kel.
kel.@kelxyz_·
@chamath have your team read this. just checked and looks like nobody in the comments has shared. but yea have felt for a while that this team is taking the most thorough approach, and most people aren't privy to their research #training-environment" target="_blank" rel="nofollow noopener">pluralis.ai/blog/pluralis-…
English
0
2
22
1.9K
Alexander Long retweetledi
Chamath Palihapitiya
Chamath Palihapitiya@chamath·
If Martin is right, he also just wrote the product spec for open source + distributed compute where broad swaths of groups, individuals and organizations contribute their compute resources to training runs for large param open source models. There are lots of issues in figuring this out: homogeneity vs heterogeneity of the training clusters, orchestration, financial incentives etc etc etc but some early projects are good signal as to where this can go and that these limitations can be overcome (folding@home, Venice, Tao). An attempted oligopoly on intelligence is the perfect boundary condition for a bottoms up uprising of fully open, fully distributed AI.
martin_casado@martin_casado

It's only a matter of time before only the model creators have access to the most powerful models. The rest get access to smaller, distilled versions. Or access the models through first party apps and services that don't provide direct access to the token path. The investment needs for training are too high, and distillation too effective to warrant any other future.

English
89
67
850
297.6K
Alexander Long retweetledi
clem 🤗
clem 🤗@ClementDelangue·
I think it’s @NaveenGRao who said it before but wouldn’t be surprised if the frontier labs cut their APIs entirely at some point. In a compute constrained world, they’ll always prioritize their own direct products/customers. Makes it scary and unsustainable to only build on top of their APIs!
English
54
35
498
77.8K