Yin Cui

425 posts

Yin Cui

@YinCuiCV

Research Scientist @NVIDIA | Formerly @Google, @Cornell | Views are my own

Mountain View, CA Katılım Ekim 2012

711 Takip Edilen6.8K Takipçiler

Yin Cui retweetledi

Luma@LumaLabsAI·23 Mar

Uni-1 is here! A new kind of model that thinks and generates pixels simultaneously. Less artificial. More intelligent.

English

443

775

4.7K

6.4M

Yin Cui retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

974

2.1K

19.4K

3.6M

Yin Cui retweetledi

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

1.1K

3.6K

28.3K

11M

Yin Cui retweetledi

Jiaming Song@baaadas·6 Mar

Excited to introduce Uni-1, our new *unified* multimodal model that does both understanding and generation: lumalabs.ai/uni-1 TLDR: I think Uni-1 @LumaLabsAI is > GPT Image 1.5 in many cases, and toe-to-toe with Nano Banana Pro/2. (showcase below)

English

410

95.1K

Yin Cui retweetledi

Citrini@citrini·22 Şub

JUNE 2028. The S&P is down 38% from its highs. Unemployment just printed 10.2%. Private credit is unraveling. Prime mortgages are cracking. AI didn’t disappoint. It exceeded every expectation. What happened? citriniresearch.com/p/2028gic

English

1.9K

4.2K

27.7K

28.6M

Yin Cui retweetledi

Hongchi Xia@hongchix·11 Şub

Here we introduce SAGE: Scalable Agentic 3D Scene Generation for Embodied AI, which can generate sim-ready 3D scenes with agents following user demands at scale, ready for robotic action generation. Paper, code, and SAGE-10k dataset are all released! nvlabs.github.io/sage/

English

312

27.4K

Yin Cui retweetledi

NVIDIA AI Developer@NVIDIAAIDev·7 Oca

🎂 One year of Cosmos, and what a journey it's been. 🌑 We're celebrating with some incredible milestones: 🚀 5M total downloads across the Cosmos ecosystem 🧠 Cosmos Reason is the #1 model on the physical reasoning leaderboard with 2M+ downloads on @HuggingFace 🔮 Cosmos Predict is the #1 open model on the physical AI generation leaderboard with 2M+ downloads Thank you to our amazing developer community for making this possible. Here's to pushing the boundaries of world foundation models together! 📚 Explore Models & Datasets: nvda.ws/4qFP2zg 🧑🏻‍🍳 Read the Cosmos Cookbook: nvda.ws/4qevli8

English

143

29.1K

Yin Cui retweetledi

NVIDIA AI Developer@NVIDIAAIDev·6 Oca

NVIDIA Cosmos Reason 2 is here. 🥳 An open, highly accurate reasoning vision language model for physical AI, featuring: ✅ Improved spatio-temporal understanding and timestamp precision ✅ Flexible deployment with 2B and 8B model sizes ✅ Long-context reasoning with up to 256K tokens ✅ Expanded visual perception across complex environments We also have new Cosmos releases: Predict 2.5, Transfer 2.5, and the NVIDIA GR00T N1.6 robot foundation model. 📗Read our technical blog: nvda.ws/4swwC68 🤗 Download Cosmos Reason 2 on @HuggingFace: nvda.ws/3L4B6Qy

English

114

628

45.7K

Yin Cui retweetledi

AK@_akhaliq·4 Kas

World Simulation with Video Foundation Models for Physical AI

English

395

68.2K

Yin Cui retweetledi

NVIDIA AI Developer@NVIDIAAIDev·29 Eki

NVIDIA Cosmos open models made major progress.✨ ✅ Cosmos Predict 2.5 unifies text, image, and video world generation into one model that creates longer and more coherent simulations with improved grounding and efficiency. ✅ Cosmos Transfer 2.5 introduces precise, spatially controlled world transformations that are 3.5× smaller, faster, and higher in fidelity than before. Together, these models push the boundaries of physical AI, enabling robots and agents to learn, reason, and operate in dynamically simulated worlds. Read the @HuggingFace blog. 🔗huggingface.co/blog/nvidia/co… #NVIDIAGTC

English

176

12.4K

Yin Cui retweetledi

Stefano Ermon@StefanoErmon·29 Eki

Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!

Chieh-Hsin (Jesse) Lai@JCJesseLai

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!

English

133

1.1K

126.7K

Yin Cui retweetledi

NVIDIA AI Developer@NVIDIAAIDev·17 Eki

Select a region in any image or video and get detailed captions instantly 💬 Our Describe Anything Model is transforming how we analyze visual content. 📹 See it in action when our #NVIDIAResearch team presents at #ICCV25. Register and view here ➡️ nvda.ws/499Ghbj

English

2.9K

Yin Cui retweetledi

OpenAI@OpenAI·30 Eyl

Sora 2 is here.

English

1.7K

2.3K

20.8K

Yin Cui retweetledi

NVIDIA AI@NVIDIAAI·29 Eyl

With 3M+ downloads and counting, NVIDIA Cosmos is redefining physical AI. Announced at #CORL25, new Cosmos updates are allowing developers to generate diverse data for accelerating training robot models at scale. 👏 Cosmos Predict 2.5 will combine three models into one powerful model—reducing complexity, powering up to 30s video generation, and enabling multi-view simulations. 👏 Cosmos Transfer 2.5 will be 3.5x smaller yet faster and sharper—generating photorealistic synthetic data from 3D scenes or spatial inputs. 🔗 nvda.ws/48lW6eT

English

321

40.2K

Yin Cui retweetledi

NVIDIA AI Developer@NVIDIAAIDev·25 Eyl

🎊1M reasons to celebrate.👏 Our developer community has taken NVIDIA Cosmos Reason to more than 1M downloads on @huggingface & the top spot on the Physical Reasoning Leaderboard. Join developers using Cosmos Reason to teach AI agents and robots to think like humans: ⚡ Get started with Cosmos Reason 1 NIM, an easy-to-use microservice for AI model deployment: catalog.ngc.nvidia.com/orgs/nim/teams… 📈 See the leaderboard: huggingface.co/spaces/faceboo…

English

19.8K

Yin Cui retweetledi

Thinking Machines@thinkymachines·10 Eyl

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/defeating…

English

230

1.3K

7.6K

3.4M

Yin Cui@YinCuiCV·10 Eyl

🚀 Try Cosmos-Reason1 — a reasoning VLM for Physical AI! Already 800K+ downloads!

NVIDIA Omniverse@nvidiaomniverse

Give your robots the power of human-like reasoning. 🦾 NVIDIA Cosmos Reason, a 7-billion-parameter reasoning vision-language model, applies prior knowledge, common sense, and physics understanding to critique synthetic data generated in NVIDIA Omniverse—helping train embodied AI systems such as robots and autonomous vehicles to act more realistically. Watch the full demo: nvda.ws/46a0BGs Get started with Cosmos Reason: nvda.ws/4mViJuL

English

2.2K

Yin Cui retweetledi

NVIDIA AI@NVIDIAAI·28 Ağu

How do you teach an AI model to reason? 🤔 We are developing a set of tests that coach AI models to understand the physical world and apply common sense. These tests are used to develop reasoning models such as NVIDIA Cosmos Reason which just topped the physical reasoning leaderboard on @huggingface. 🤗 Read blog: nvda.ws/3JSRYIM

GIF

English

145

86.9K

Yin Cui@YinCuiCV·28 Ağu

Excited to share that our Cosmos Reason1 model ranked 1st on Meta's physical reasoning leaderboard! 🥇 The leaderboard was recently introduced in V-JEPA 2 to track the progress of frontier models in physical understanding and reasoning. Download our model at: huggingface.co/nvidia/Cosmos-… #NVIDIACosmos

NVIDIA AI Developer@NVIDIAAIDev

Ranked #1 on @Meta's Physical Reasoning Leaderboard on @huggingface for a reason. 👏 🔥 🏆 Cosmos Reason enables robots and AI agents to reason like humans by leveraging prior knowledge, physics, and common sense to intelligently interact with the real world. This state-of-the-art reasoning VLM excels in physical AI applications like: 📊 Data curation and annotation 🤖 Robot planning and reasoning ▶️ Video analytics AI agents See the leaderboard → nvda.ws/4mLUmjd Check out Cosmos Reason → nvda.ws/425mMfF

English

1.7K

Yin Cui@YinCuiCV·15 Ağu

@QianqianWang5 Congratulations @QianqianWang5 !

English

469

Qianqian Wang@QianqianWang5·15 Ağu

📢Thrilled to share that I'll be joining Harvard and the Kempner Institute as an Assistant Professor starting Fall 2026! I'll be recruiting students this year for the Fall 2026 admissions cycle. Hope you apply!

Kempner Institute at Harvard University@KempnerInst

We are thrilled to share the appointment of @QianqianWang5 as an #KempnerInstitute Investigator! She will bring her expertise in computer vision to @Harvard. Read the announcement: bit.ly/4mIghHy @hseas #AI #ComputerVision

English

101

746

111.5K

Keşfet

@LumaLabsAI @HuggingFace @huggingface @QianqianWang5 @elonmusk @BarackObama @taylorswift13 @cristiano