Alexander Doria

46.1K posts

Alexander Doria banner
Alexander Doria

Alexander Doria

@Dorialexander

building open ai infrastructure @pleiasfr

Katılım Nisan 2011
4K Takip Edilen23.2K Takipçiler
Alexander Doria
Alexander Doria@Dorialexander·
Great overview of world model research. Particularly liked this part, about the lead example in the open, Cosmos : "what makes it a world model is the training data".
Julia Turc@juliarturc

"World models" is one of the buzziest yet ambiguous terms in AI right now. I started this video with many questions: - How are they different from video generation? - Can they do more than AI slop? - Can LeCun be trusted given that he wears knee-high white socks? Many thanks to @tjgalda and @NVIDIAAI for helping me answer (most) of these questions!

English
9
5
68
8.9K
Alexander Doria
Alexander Doria@Dorialexander·
@ObscureLocal I do have a slight interest but I believe the critical thing will be the availability of synthetic pipelines + post-training methods (especially for smaller MoE with super cheap inference). Compute isn’t the main blocker.
English
1
0
4
312
Obscure Local Historian
Obscure Local Historian@ObscureLocal·
Very interesting. I think the economics are also starting to make sense for U.S. companies to take the China path. I myself am contemplating whether now is the time to begin doing this as an individual, which is maybe telling of some other future event. I was reading a few days ago that it might cost very little right now, less than a hundred dollars, to train something like a 9B model on Colab. I haven't verified that yet, but if it were the case, Training As A Service is not far away. I'd very much like to organize my training data, upload it, pick my arch and hyperparameters in a dashboard, and put in coins. In the long run, the shape of things may even come to resemble that. Like software was created to automate simple verifiable tasks, it will not necessarily be efficient to do everything with a 100 trillion parameter super-AI. It may be better to use that AI to build an edge model that can solve your repeatable process problem well. So the shape of the AI economy comes at some point to somewhat resemble the former shape of the software economy, perhaps. Another thought is what happens when diffuser economics do become well optimized. It was maybe last year or earlier this year that there was a lot of speculation about generative user interfaces, and I see people making purely generative web servers and such things. Maybe then "the model IS the product". 😅
English
1
0
2
478
Alexander Doria
Alexander Doria@Dorialexander·
@advait_jayant Yes. To some extent they had surprisingly dated prior (also over scaling: highly sparsed MoE were not what they had in mind)
English
0
0
5
296
Advait
Advait@advait_jayant·
@Dorialexander funnily enough ai-2027 got itself backwards on china. it had the ccp nationalizing everything into a single megalab next to a nuclear plant. the opposite happened. deepseek, qwen, kimi, minimax, and as you pointed out even meituan and xiaomi are all shipping their own models!
English
1
0
7
619
52dsl
52dsl@52dsl·
@Dorialexander Très intéressant. Merci ! (il doit cependant manquer un bout à cette phrase : "By 2023, OpenAI and Anthropic had an early lead in model development but nothing that would prevent...")
Français
1
0
1
222
Alexander Doria
Alexander Doria@Dorialexander·
@ed_brz9 @jackson_stokes Thought briefly about that but disagree now. Memory is actually needed for many agentic processes (basically model needs to understand what is being asked, and how to look for things)
English
1
0
1
14
Ed Brz9
Ed Brz9@ed_brz9·
@Dorialexander @jackson_stokes For now I just do synth data and fine tune so I’m not at all qualified to talk about model architecture, but do LLM need to know much to be useful? I feel like more weight dedicated to attention could benefit agentic capabilities. Near zero knowledge models that can use tools
English
1
0
0
10
Jackson Stokes
Jackson Stokes@jackson_stokes·
There seems to be a ~3B lower limit for useful LLMs. below that, instruction following and ICL drop off a cliff? Is there some fundamental reason for this?
English
8
2
55
10.9K
Alexander Doria
Alexander Doria@Dorialexander·
and here is anthropic soft power: blessed be circuit transformers (and the data that feed it).
Alexander Doria tweet media
English
1
2
15
874
Alexander Doria
Alexander Doria@Dorialexander·
great seeing the pope supporting tokenizer research.
Alexander Doria tweet media
English
2
1
46
1.9K
Alexander Doria
Alexander Doria@Dorialexander·
@Noahpinion Just creating conditions for proper open-ended research (the kind OpenAI has just started to emulate) and selecting for people willing to do that.
English
1
0
2
470
Alexander Doria
Alexander Doria@Dorialexander·
@JulienBlanchon @bastiengares (les documents vraiment corpos dans certains domaines c'est totalement galère à localiser en webcralw et les quantités sont pas là : je comprends la logique de racheter des données de boîte)
Français
1
0
1
53
Alexander Doria
Alexander Doria@Dorialexander·
@JulienBlanchon @bastiengares Moi c'est un peu les échos que j'ai eu côté Anthropic (et de certains fournisseurs RL) : migration d'environnements contraints avec pas mal de connaissance métiers vers des workflow plus ouverts de la donnée en gros, moins structuré.
Français
1
0
1
54
Rasmus
Rasmus@synquid·
Synthetic environments is definitely the next big thing
English
1
0
5
808