Yin Cui

425 posts

Yin Cui banner
Yin Cui

Yin Cui

@YinCuiCV

Research Scientist @NVIDIA | Formerly @Google, @Cornell | Views are my own

Mountain View, CA Katılım Ekim 2012
711 Takip Edilen6.8K Takipçiler
Yin Cui retweetledi
Luma
Luma@LumaLabsAI·
Uni-1 is here! A new kind of model that thinks and generates pixels simultaneously. Less artificial. More intelligent.
English
443
775
4.7K
6.4M
Yin Cui retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
974
2.1K
19.4K
3.6M
Yin Cui retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)
Andrej Karpathy tweet media
English
1.1K
3.6K
28.3K
11M
Yin Cui retweetledi
Jiaming Song
Jiaming Song@baaadas·
Excited to introduce Uni-1, our new *unified* multimodal model that does both understanding and generation: lumalabs.ai/uni-1 TLDR: I think Uni-1 @LumaLabsAI is > GPT Image 1.5 in many cases, and toe-to-toe with Nano Banana Pro/2. (showcase below)
Jiaming Song tweet media
English
29
53
410
95.1K
Yin Cui retweetledi
Citrini
Citrini@citrini·
JUNE 2028. The S&P is down 38% from its highs. Unemployment just printed 10.2%. Private credit is unraveling. Prime mortgages are cracking. AI didn’t disappoint. It exceeded every expectation. What happened?​​​​​​​​​​​​​​​​ citriniresearch.com/p/2028gic
English
1.9K
4.2K
27.7K
28.6M
Yin Cui retweetledi
Hongchi Xia
Hongchi Xia@hongchix·
Here we introduce SAGE: Scalable Agentic 3D Scene Generation for Embodied AI, which can generate sim-ready 3D scenes with agents following user demands at scale, ready for robotic action generation. Paper, code, and SAGE-10k dataset are all released! nvlabs.github.io/sage/
English
4
43
312
27.4K
Yin Cui retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
🎂 One year of Cosmos, and what a journey it's been. 🌑 We're celebrating with some incredible milestones: 🚀 5M total downloads across the Cosmos ecosystem 🧠 Cosmos Reason is the #1 model on the physical reasoning leaderboard with 2M+ downloads on @HuggingFace 🔮 Cosmos Predict is the #1 open model on the physical AI generation leaderboard with 2M+ downloads Thank you to our amazing developer community for making this possible. Here's to pushing the boundaries of world foundation models together! 📚 Explore Models & Datasets: nvda.ws/4qFP2zg 🧑🏻‍🍳 Read the Cosmos Cookbook: nvda.ws/4qevli8
NVIDIA AI Developer tweet media
English
12
34
143
29.1K
Yin Cui retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
NVIDIA Cosmos Reason 2 is here. 🥳 An open, highly accurate reasoning vision language model for physical AI, featuring: ✅ Improved spatio-temporal understanding and timestamp precision ✅ Flexible deployment with 2B and 8B model sizes ✅ Long-context reasoning with up to 256K tokens ✅ Expanded visual perception across complex environments We also have new Cosmos releases: Predict 2.5, Transfer 2.5, and the NVIDIA GR00T N1.6 robot foundation model. 📗Read our technical blog: nvda.ws/4swwC68 🤗 Download Cosmos Reason 2 on @HuggingFace: nvda.ws/3L4B6Qy
English
16
114
628
45.7K
Yin Cui retweetledi
AK
AK@_akhaliq·
World Simulation with Video Foundation Models for Physical AI
AK tweet media
English
8
74
395
68.2K
Yin Cui retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
NVIDIA Cosmos open models made major progress.✨ ✅ Cosmos Predict 2.5 unifies text, image, and video world generation into one model that creates longer and more coherent simulations with improved grounding and efficiency. ✅ Cosmos Transfer 2.5 introduces precise, spatially controlled world transformations that are 3.5× smaller, faster, and higher in fidelity than before. Together, these models push the boundaries of physical AI, enabling robots and agents to learn, reason, and operate in dynamically simulated worlds. Read the @HuggingFace blog. 🔗huggingface.co/blog/nvidia/co… #NVIDIAGTC
English
10
34
176
12.4K
Yin Cui retweetledi
Stefano Ermon
Stefano Ermon@StefanoErmon·
Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!
Chieh-Hsin (Jesse) Lai@JCJesseLai

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!

English
13
133
1.1K
126.7K
Yin Cui retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
Select a region in any image or video and get detailed captions instantly 💬 Our Describe Anything Model is transforming how we analyze visual content. 📹 See it in action when our #NVIDIAResearch team presents at #ICCV25. Register and view here ➡️ nvda.ws/499Ghbj
English
1
6
42
2.9K
Yin Cui retweetledi
OpenAI
OpenAI@OpenAI·
Sora 2 is here.
English
1.7K
2.3K
20.8K
9M
Yin Cui retweetledi
NVIDIA AI
NVIDIA AI@NVIDIAAI·
With 3M+ downloads and counting, NVIDIA Cosmos is redefining physical AI. Announced at #CORL25, new Cosmos updates are allowing developers to generate diverse data for accelerating training robot models at scale. 👏 Cosmos Predict 2.5 will combine three models into one powerful model—reducing complexity, powering up to 30s video generation, and enabling multi-view simulations. 👏 Cosmos Transfer 2.5 will be 3.5x smaller yet faster and sharper—generating photorealistic synthetic data from 3D scenes or spatial inputs. 🔗 nvda.ws/48lW6eT
English
26
79
321
40.2K
Yin Cui retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
🎊1M reasons to celebrate.👏 Our developer community has taken NVIDIA Cosmos Reason to more than 1M downloads on @huggingface & the top spot on the Physical Reasoning Leaderboard. Join developers using Cosmos Reason to teach AI agents and robots to think like humans: ⚡ Get started with Cosmos Reason 1 NIM, an easy-to-use microservice for AI model deployment: catalog.ngc.nvidia.com/orgs/nim/teams… 📈 See the leaderboard: huggingface.co/spaces/faceboo…
NVIDIA AI Developer tweet media
English
4
16
73
19.8K
Yin Cui retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/defeating…
Thinking Machines tweet media
English
230
1.3K
7.6K
3.4M
Yin Cui retweetledi
NVIDIA AI
NVIDIA AI@NVIDIAAI·
How do you teach an AI model to reason? 🤔 We are developing a set of tests that coach AI models to understand the physical world and apply common sense. These tests are used to develop reasoning models such as NVIDIA Cosmos Reason which just topped the physical reasoning leaderboard on @huggingface. 🤗 Read blog: nvda.ws/3JSRYIM
GIF
English
20
22
145
86.9K
Yin Cui
Yin Cui@YinCuiCV·
Excited to share that our Cosmos Reason1 model ranked 1st on Meta's physical reasoning leaderboard! 🥇 The leaderboard was recently introduced in V-JEPA 2 to track the progress of frontier models in physical understanding and reasoning. Download our model at: huggingface.co/nvidia/Cosmos-… #NVIDIACosmos
NVIDIA AI Developer@NVIDIAAIDev

Ranked #1 on @Meta's Physical Reasoning Leaderboard on @huggingface for a reason. 👏 🔥 🏆 Cosmos Reason enables robots and AI agents to reason like humans by leveraging prior knowledge, physics, and common sense to intelligently interact with the real world. This state-of-the-art reasoning VLM excels in physical AI applications like: 📊 Data curation and annotation 🤖 Robot planning and reasoning ▶️ Video analytics AI agents See the leaderboard → nvda.ws/4mLUmjd Check out Cosmos Reason → nvda.ws/425mMfF

English
1
0
15
1.7K
Qianqian Wang
Qianqian Wang@QianqianWang5·
📢Thrilled to share that I'll be joining Harvard and the Kempner Institute as an Assistant Professor starting Fall 2026! I'll be recruiting students this year for the Fall 2026 admissions cycle. Hope you apply!
Kempner Institute at Harvard University@KempnerInst

We are thrilled to share the appointment of @QianqianWang5 as an #KempnerInstitute Investigator! She will bring her expertise in computer vision to @Harvard. Read the announcement: bit.ly/4mIghHy @hseas #AI #ComputerVision

English
101
43
746
111.5K