Alexander Long

450 posts

Alexander Long banner
Alexander Long

Alexander Long

@AlexanderLong

Founder @Pluralis | ML PhD

Katılım Temmuz 2023
1.2K Takip Edilen3.1K Takipçiler
Sabitlenmiş Tweet
Alexander Long
Alexander Long@AlexanderLong·
Since I started getting interested in ML I got it in my head that all I wanted to do was one smart thing that I could look back on and be satisfied that I did. Most papers are kinda bad even if they get accepted - the idea is very incremental, or it's just not that good an idea, or it doesn't really matter. I never was able to do this all through PhD or my time at Amazon. All the papers I did there got into various places, but I never really thought they were actually that good. And I'd pretty much given up on this because Pluralis meant I couldn't really devote enough time to research myself. But in February I decided I didn't care and spend two months focused on a specific problem that had been going round in my head for about a year that I felt we needed to solve, and the solution came to me, and @ChaminHewa picked it up and generalised the approach and ran a bunch of novel experiments I hadn't thought of, and pulled everything together into an actual paper. And yesterday we presented this work at NeurIPS. This is the first and probably only work I will ever do that for me feels like "ok that was GOOD". I don't care if it racks up a bunch of citations and disperses into the field or not, I don't care if someone repackages the ideas and takes all the credit for it, I don't care. For me there is an internal checkbox that just got ticked after more than ten years of trying. Anyone in ML will understand what I'm trying to say. Special day I'm going to remember for a long time.
Alexander Long tweet media
English
21
11
196
16.2K
Dylan Patel
Dylan Patel@dylan522p·
Deepseek v4 still not released Alibaba Qwen going closed Western open weights models slacking In these dark times for open source, who will save us? Alliances must be made, brothers must band together! A world of only closed source AI will lead to consolidation of power! Tyranny!
English
109
55
1.1K
136.3K
Alexander Long retweetledi
Jake Brukhman
Jake Brukhman@jbrukh·
There's a lot of really exciting developments happening in decentralized AI training this year. Here's my take on why decentralized training is moving from "impossible" to "investable". 🧵👇
Jake Brukhman tweet media
English
10
34
121
16.1K
Alexander Long retweetledi
Thalaiyasingam Ajanthan
Thalaiyasingam Ajanthan@tha_ajanthan·
Inspired by @DimitrisPapail's and @karpathy's experiments, I’ve been running some research experiments with Claude Code — I call it Agent Descent (analogous to graduate student descent). The idea is to improve our Subspace Nets (SSN) paper (communication-efficient pipeline parallelism, arxiv.org/abs/2506.01260) in a setup where it underperforms the uncompressed baseline for the same token budget. The research direction is to use a mixture of SSNs, where each DP replica optimizes on a different subspace. The agent can tune subspace selection, DP gradient aggregation, etc., and search arXiv, but cannot change the optimizer or other training hyperparameters. My setup is far from optimal, but the agent was able to make notable improvements on its own without any specific guidance/mention in the prompt. Eg: 1. Identified that GaLore-based gradient projection might be useful 2. Suggested per-layer subspaces to improve expressivity (we have been working on this in a separate project) It also did a bit of cheating (e.g., uncompressing/reducing compression for some layers). See below for a summary of experiments it tried. For me personally, this was a reality check — and a glimpse of how research might look like in the coming months.
Thalaiyasingam Ajanthan tweet media
English
2
8
58
5.4K
Alexander Long
Alexander Long@AlexanderLong·
@zachtratar we have a pretty disproportionate research output for <10 person team. work on architectures you can collaboratively train
Alexander Long tweet media
English
1
0
13
825
Zach Tratar
Zach Tratar@zachtratar·
Are there any new startups attempting to become frontier labs? I’m not talking about SSI or Thinking Machines… smaller. More of the dark horse vibe team…
English
75
3
262
75.3K
Alexander Long
Alexander Long@AlexanderLong·
insane sequence of statements buried in an Alibaba tech report
Alexander Long tweet media
English
230
946
6.9K
2.8M
Alexander Long
Alexander Long@AlexanderLong·
@natolambert commoditizing the compliment doesn't make a lot of sense when the thing your trying to commoditize is the most capital intensive part of the entire stack. Hopefully meta hangs in there for a while at least though.
English
0
0
2
3.3K
Alexander Long retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
An existential risk for near term open-weight models. In the coming years, the only places with business reasons for building them 1) non profits -- good for research/the world 2) nvidia's -- keep their hardware up with ai 3) meta's -- commodotize their complements
Carl Franzen@carlfranzen

@jeremyphoward Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source venturebeat.com/technology/did…

English
19
19
243
33.2K
Alexander Long
Alexander Long@AlexanderLong·
It's a while before we'll know for sure and maybe Qwen's going closed now and maybe it's not, but the way I always thought about things was that you can't spend 100M/year (and in many cases much more) on research training costs then give the result away for free under an open licence and for this to be sustainable state of affairs. I don't know why this was so controversial. I can't count the number of times someone told me "but they're commoditising the compliment" as if that somehow made sense and explained things. It's important because effectively the entire AI ecosystem outside of the closed labs is predicated on the assumption that close-to-frontier models will be available for free, indefinitely. On-prem, enterprise post-trains, all local AI, all of it depends on that assumption being true. My view is decentralised training where you don't leak the weight-sets (i.e. protocol learning, unextractable protocol models etc.) is the only way out if that assumption is wrong.
Jake Brukhman@jbrukh

This is crazy. Broad consensus, even from observers and investors in China, has been that Chinese models will remain open. According to this event, that's potentially starting to shift. I know one person who was ultra-convicted on this very contrarian view of closed-source Chinese models, and that's @AlexanderLong. I had chortled when he told me this view, but now I am seeing data points that he's right and this theory has legs. Closed Chinese models of course mean there starts to be a big gap between closed and open SOTAs. And decentralized AI training networks fill that gap!

English
1
4
24
3K
Alexander Long
Alexander Long@AlexanderLong·
@kevinsxu Whether true or not all you have to do is read the minimax and z ai IPO prospectuses for two minutes for it to be extremely clear the openweight labs have no option but to go closed eventually. Bizarre to me this was so counter to mainstream opinion for the last 6 months
English
0
0
4
2.3K
Kevin S. Xu
Kevin S. Xu@kevinsxu·
Based on the latest rumor mill, looks like two things happened: 1. CEO of Alibaba Cloud (who is btw the CEO of all of Alibaba) is exerting a more direct line of sight on Qwen 2. A new person, possibly someone who was ex Gemini team, is brought in and layered on top of current Qwen leaders, thus the mass exodus If true, it looks like future advanced Qwen models might become closed soon, as Alibaba tries to replicate the GCP/Gemini playbook. As a *pure business decision*, this actually makes sense...(this is not at all to diminish all the hard work, goodwill, and open source community building that current Qwen team did to get Qwen to where it is today in the first place.) Alibaba and Google are the *only* tech companies that have *both* in-house frontier AI models *and* a sizable and global 3rd party cloud business that needs to grow even bigger with AI adoption. (Azure/AWS, great cloud, no in-house models, OAI is playing both sides. All other AI labs have no standalone cloud business.) GCP grew by a whopping 48% last year. AliCloud is no where near that and starting from a smaller base On paper, bringing in a Gemini person and being more commercialization focused, which always means closing not opening more models, appears logical as a short to mid-term business decision... But just because you signed someone who was on a Superbowl team doesn't mean you'll win the Super Bowl too Meanwhile, this resignation exodus is a bad look and losing lots of goodwill...
Kevin S. Xu@kevinsxu

Watching Qwen team implode on Twitter is sad to see... Looks like Qwen will go the route of closed models soon AliCloud gotta make money somehow I guess... (Worth noting $BABA earnings date still not announced, more delayed than usual...)

English
47
193
1.5K
557.8K
Alexander Long retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
The gaping hole that Qwen imploding would leave in the open research ecosystem will be hard to fill. The small models are irreplaceable. I’ll do my best to keep carrying that torch (not that I’ve reached the level of impact of Qwen by any means). Every bit matters.
English
43
48
1.1K
58K
Alexander Long
Alexander Long@AlexanderLong·
I think argument is unlike opensource software which is someone donating time, opensource AI is someone donating huge amounts of money to train the model then giving result away for free. This has problems and we can't rely on it. Z AI and minimax prospectus's make this very clear. There is a third path here not openweight and not closed labs.. we publish a lot about this.
English
0
0
3
720
the tiny corp
the tiny corp@__tinygrad__·
I'm not sure why any AI researchers continue to work at closed source labs. You know the money won't be worth anything. Hopefully it's clear now you won't get any control. And you are on the wrong side of history. Be a scientist, join a lab where you can publish.
English
20
74
1K
60.9K
the tiny corp
the tiny corp@__tinygrad__·
TIL that Linux isn't free cause I had to buy the computer to run it 😭 This is the biggest load of cope I have ever heard. I hope Opus gets smoked by DeepSeek v4 and the only people who continue to use closed source models are Windows users.
Dustin@r0ck3t23

Dario Amodei just dismantled the biggest myth in the AI industry. Open source AI isn’t free. It never was. Amodei: “It’s not free. You have to run it on inference and someone has to make it fast on inference.” For decades, open source meant something real. It meant a teenager in a basement could download the same tools as a Fortune 500 company. Could read the code. Could modify it. Could build something that competed with the giants. That was genuine democratization. That actually happened. AI is different. Fundamentally. Physically. In ways the ideology hasn’t caught up to yet. Downloading the weights is the easy part. The part that actually costs something is turning the weights into a running system. Into responses. Into intelligence operating in real time at scale. That requires compute. Power. Infrastructure. The kind measured in billions of dollars and years of construction. Amodei: “These are big models. They’re hard to do inference on. Ultimately you have to host it on the cloud. The people who host it on the cloud do inference.” The open source debate was never about who owns the model. It was always about who owns the cloud. And Amodei goes further. When a competitor drops a new open model, he doesn’t ask whether it’s open or closed. He doesn’t care about the licensing. He doesn’t engage the ideology. Amodei: “I don’t think it mattered that DeepSeek is open source. I think I ask, is it a good model? Is it better than us at the things that matter? That’s the only thing that I care about.” That’s the ruthless clarity of someone actually trying to win. While the media debates licensing frameworks, Amodei is asking one question. Is it better. Everything else is a distraction. Amodei: “I don’t think open source works the same way in AI that it has worked in other areas. Here we can’t see inside the model.” This isn’t Linux. You can’t read it. You can’t fork it. You can’t understand it the way generations of developers understood the tools they inherited. You can download it. And then you need a data center to run it. The teenager in the basement who was supposed to be empowered by this revolution needs a billion dollars of infrastructure before the empowerment starts. The era of the basement coder rewriting civilization on a laptop is over. The future belongs to whoever commands the compute, owns the power grid, and can actually turn the intelligence on. Open weights without infrastructure isn’t democratization. It’s a promise the physics of the universe won’t let us keep.

English
91
235
3.9K
176.7K
Alexander Long retweetledi
Antidote
Antidote@AxelMtbr·
That’s true! Big fan of exo but this is a reason why I still eye decentralised training projects like @PrimeIntellect or @Pluralis I feel like this is still a blind spot when it comes to sovereignty. We need decentralised, economic systems that can develop frontier open weight model consistently. Or else self hosting might not be viable long term.
English
1
1
4
892
Alexander Long
Alexander Long@AlexanderLong·
Forget which party is in power now and just imagine the party you dislike in control of a technology that influences how you and everyone around you interprets reality.
English
0
0
8
415
Alexander Long
Alexander Long@AlexanderLong·
@bloomberg_seth am confident we're going to be able to do verification with basically no overhead.
English
1
0
9
291
seth bloomberg
seth bloomberg@bloomberg_seth·
The decentralized training bull case is lining up quite well: • Massive + growing amount of idle compute (consumer + dc) looking for >$0 ROI • Giga-brained teams have largely solved the AI research probs, moving to engineering work Verification (at scale) likely remains the thing to be solved, and this is non-trivial given it's part of the economic tradeoff. Supply-side costs of these networks will be structurally lower (they can spend much less than market rates since they are getting idle compute), but they introduce a fundamentally new expense category, the cost of verification, which centralized settings don't have.
English
8
4
36
3.3K