Milad M

139 posts

Milad M

Milad M

@_miladm_

PyTorch @Meta Superintelligence Labs - Ex: @Google, @Stanford, @Nvidia, @Apple

Stanford, CA Katılım Ağustos 2012
407 Takip Edilen213 Takipçiler
Milad M retweetledi
Alexandr Wang
Alexandr Wang@alexandr_wang·
the muse spark API will be coming soon! we have been thrilled with the amount of excitement amongst developers who want to try muse spark inside their agentic harnesses stay tuned!
English
125
88
1.8K
161.1K
Milad M retweetledi
Milad M retweetledi
Alexandr Wang
Alexandr Wang@alexandr_wang·
Meta AI is up to #6 in the App Store overnight, and still growing :) Also who knew the 7-Eleven app was so popular
Alexandr Wang tweet media
English
206
79
2.2K
268.7K
Milad M retweetledi
Alexandr Wang
Alexandr Wang@alexandr_wang·
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
Alexandr Wang tweet media
English
729
1.2K
10.3K
4.5M
Milad M retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
966
2.1K
19.5K
3.6M
Milad M retweetledi
Fei-Fei Li
Fei-Fei Li@drfeifei·
This came as a total surprise this morning. Very humbled… 🙏 AI is built by generations of technologists, starting with the daring question of “can machines think?” by Alan Turing. It will be further developed, used and governed by many and all of us! Let’s keep our AI mission human-centered for the benefit of humanity! And I can’t wait to see where AI’s next frontier - spatial intelligence - will be taking us!
TIME@TIME

2025 was the year when artificial intelligence’s full potential roared into view, and when it became clear that there will be no turning back. For delivering the age of thinking machines, for wowing and worrying humanity, for transforming the present and transcending the possible, the Architects of AI are TIME’s 2025 Person of the Year. time.com/7339685/person…

English
164
284
2.6K
338.9K
Milad M retweetledi
TIME
TIME@TIME·
2025 was the year when artificial intelligence’s full potential roared into view, and when it became clear that there will be no turning back. For delivering the age of thinking machines, for wowing and worrying humanity, for transforming the present and transcending the possible, the Architects of AI are TIME’s 2025 Person of the Year. time.com/7339685/person…
TIME tweet media
English
2.7K
2.3K
8.2K
18.7M
Milad M retweetledi
PyTorch
PyTorch@PyTorch·
Today’s "Inside Helion Live Q&A" brought the #PyTorch community together with Jason Ansel, @oguz_ulgen, @weifengpy, and Jongsok Choi from @Meta’s PyTorch Compiler and Helion teams. The discussion explored how Helion approaches kernel authoring, #AIInfrastructure performance, and autotuning at scale. 🖇️ Watch the full recording: youtube.com/live/_gIyr1BVU… More from Jason Ansel at #PyTorch Conference 2025: youtu.be/BW-Ht-5IxgM?si… Read the Helion blog: pytorch.org/blog/helion/
YouTube video
YouTube
YouTube video
YouTube
English
3
10
71
8.6K
Milad M retweetledi
Milad M retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
Hanging out with Project Jupyter co-founder @ellisonbg. If not for him and @fperez_org we wouldn’t have the coding notebooks we use daily in AI and Data Science. Very grateful to him and the whole Jupyter team for this wonderful open-source work!
Andrew Ng tweet media
English
44
87
936
71.1K
Milad M retweetledi
maharshi
maharshi@maharshii·
it is common knowledge but i absolutely love how torch compile is able to fuse separate function calls doing some elementwise ops into a single triton kernel. it's like witnessing magic. we don't appreciate torch compile enough.
maharshi tweet mediamaharshi tweet media
English
4
5
221
12.2K
Milad M retweetledi
Lilian Weng
Lilian Weng@lilianweng·
GPUs are expensive and setting up the infrastructure to make GPUs work for you properly is complex, making experimentation on cutting-edge models challenging for researchers and ML practitioners. Providing high quality research tooling is one of the most effective ways to improve research productivity of the wider community and Tinker API is one step towards our mission there. Tinker API is built on top of our experimental results on fine-tuning with LoRA: thinkingmachines.ai/blog/lora/ Beta starts and you can join the waitlist today: thinkingmachines.ai/tinker/
Lilian Weng tweet media
English
47
131
2.1K
204K
Milad M retweetledi
Mira Murati
Mira Murati@miramurati·
Sharing our second Connectionism research post on Modular Manifolds, a mathematical approach to refining training at each layer of the neural network
Thinking Machines@thinkymachines

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices. thinkingmachines.ai/blog/modular-m… We explore a fundamental understanding of the geometry of neural network optimization.

English
85
256
2.5K
286.5K
Milad M retweetledi
🚨 AI News | TestingCatalog
🚨 AI News | TestingCatalog@testingcatalog·
BREAKING 🚨: Meta announced Meta Ray-Ban Display AI Glasses with an EMG Wristband! Did Zuck just kill the phone industry? 👀 Honestly, a wristband is a HUGE enabler, but there are significant questions about its quality.
English
53
71
643
136.7K
Milad M retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/defeating…
Thinking Machines tweet media
English
229
1.2K
7.6K
3.5M
Milad M retweetledi
a16z
a16z@a16z·
.@drfeifei on why LLMs will struggle to solve spatial intelligence: "Language is fundamentally a purely generated signal." "You don't go out in nature and there's words written in the sky for you." "There is a 3D world out there that follows laws of physics... to fundamentally back that information out and be able to represent it and be able to generate it is just fundamentally quite a different problem."
a16z@a16z

The moment is right to push forward into a new frontier for AI — one that is as fundamental as language, says @drfeifei. That frontier is visual spatial intelligence. With Justin Johnson (@jcjohnss), her cofounder at @theworldlabs, and a16z's @martin_casado, Fei-Fei explains what unlocking this technology could mean, and why we’re in the midst of a “Cambrian explosion”:

English
55
202
1.1K
285.1K