HanRong YE @ Nvidia

397 posts

HanRong YE @ Nvidia banner
HanRong YE @ Nvidia

HanRong YE @ Nvidia

@leoyerrrr

Nvidia Research Scientist | Led OmniVinci, MM-Ego, X-VILA..

Santa Clara Katılım Ekim 2012
1.9K Takip Edilen972 Takipçiler
Sabitlenmiş Tweet
HanRong YE @ Nvidia
HanRong YE @ Nvidia@leoyerrrr·
OmniVinci is now #1 paper on Huggingface!!! 🤗 Building omni-modal LLMs is MORE than just mixing tokens 😉 At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems — leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a unified vision–audio alignment module powered by contrastive learning - Temporal Embedding Grouping & Constrained Rotary Time Embedding – enabling absolute and relative temporal representation across multimodal tokens - To support this, we curated a 24M-sample omni-modal dataset and developed a new large-scale data engine for efficient labeling. 🔍 Key Findings: - Audio understanding significantly enhances video comprehension - Audio signals improve omni-modal reinforcement learning - Modality-specific captioning falls short — true understanding demands omni-modal context 📈 Results: OmniVinci-9B outperforms Qwen2.5-Omni across omni-modal, vision, and audio benchmarks — using only 1/6 of the training tokens. #LLMs #AI #ML
HanRong YE @ Nvidia tweet mediaHanRong YE @ Nvidia tweet mediaHanRong YE @ Nvidia tweet mediaHanRong YE @ Nvidia tweet media
English
11
27
151
22.3K
HanRong YE @ Nvidia
HanRong YE @ Nvidia@leoyerrrr·
NVIDIA is the company most driven to advance open-source research! 😃
vLLM@vllm_project

🎉 Congrats to @nvidia on the release of Nemotron 3 Super — day-0 support in vLLM v0.17.1! Verified on NVIDIA GPUs. 120B hybrid MoE, only 12B active at inference. Big upgrades over the previous Nemotron Super: - 5x higher throughput - 2x higher accuracy on Artificial Analysis Intelligence Index - Multi-Token Prediction (MTP) for faster long-form generation - Configurable thinking budget — dial accuracy vs token cost per task - 1M token context window Supports BF16, FP8, and NVFP4. Fully open: weights, datasets, recipes. Blog: vllm.ai/blog/nemotron-… 🤝 Thanks @NVIDIAAIDev Nemotron team and vLLM community contributors!

English
0
0
4
208
HanRong YE @ Nvidia retweetledi
Shizhe Diao
Shizhe Diao@shizhediao·
Super excited to see Nemotron-CLIMBMix being used in the nanochat speedrun and making such a big difference 🚀 And yes, we also worried about potential goodharting when we first saw the results 😅 But the clustering + iterative mixing design seems to generalize quite robustly in our experiments. Really grateful to Andrej and the community for trying it out and pushing the idea further.✨
Andrej Karpathy@karpathy

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English
0
4
39
3.8K
HanRong YE @ Nvidia retweetledi
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Chinese New Year is rapidly becoming the AI researcher's favorite holiday
English
40
57
1.3K
140.7K
underwood
underwood@underwoodxie96·
WTF, I uploaded a screenshot from the One Piece manga and asked Seedance 2.0 to generate a video for me, and it actually worked! prompt: Video generated from reference text, with automatic coloring.
English
73
72
646
208.5K
Wenhao Chai
Wenhao Chai@wenhaocha1·
prism might not only released for human, it's for the future ai "researchers". If AI can independently produce research that meets somehow good standards and also has access to far more compute, then our existing evaluation systems are in a dangerous position. prism.openai.com
English
1
0
24
2.1K
HanRong YE @ Nvidia
HanRong YE @ Nvidia@leoyerrrr·
NVIDIA is a truly fascinating place for AI researchers. Ask about anything—from GPU hardware to LLM architectures—and someone will have a great answer. Even better, people are eager to share their latest results. No wonder NVIDIA is "the" AI company. 😃
English
0
0
9
322
HanRong YE @ Nvidia retweetledi
Pavlo Molchanov
Pavlo Molchanov@PavloMolchanov·
🚀 Top-1 on GAIA agentic benchmark Nemotron-ToolOrchestra just achieved #1 on GAIA — a benchmark focused on real agentic reasoning. The key idea is simple but powerful: · train a small orchestration model that ·· decomposes a task into smaller sub-tasks ·· decides which tools or models to call ·· sequences execution efficiently Not one big model doing everything - but coordination. Project page: research.nvidia.com/labs/lpr/ToolO… Benchmark: huggingface.co/spaces/gaia-be… Details in my previous post: x.com/PavloMolchanov…
Pavlo Molchanov tweet media
English
5
8
83
9.1K
HanRong YE @ Nvidia
HanRong YE @ Nvidia@leoyerrrr·
Catch our NVIDIA Omni LLM team (including me 😉) at #NeurIPS 2025 on Dec 3: 1. OmniVinci Social at the NVIDIA booth: 2:00–3:00 pm and 4:30–5:00 pm 2. NVIDIA Reception at Puesto (789 W Harbor Dr, Unit 155, San Diego): 6:00 pm – 10:00 pm 3. NVIDIA Omni LLMs Meetup (application full)
HanRong YE @ Nvidia tweet media
English
3
3
27
3K
HanRong YE @ Nvidia retweetledi
Shizhe Diao
Shizhe Diao@shizhediao·
🚀 Excited to share ToolOrchestra, an end-to-end RL training framework for orchestrating tools and agentic workflows. Everyone’s building agent workflows these days — connecting tools, APIs, and LLMs like LEGO. 🧩 But here are our findings: 👉 Just prompting the agent workflow won’t cut it. It’s not how you build the best agent. 👉 Without learning, workflows plateau fast. It’s time to bring RL fine-tuning 🔥back into agent development. (1/n)
Shizhe Diao tweet media
English
29
71
347
67.4K
Vishaal Udandarao
Vishaal Udandarao@vishaal_urao·
🚀 New paper! arxiv.org/abs/2511.16655 Recently, Cambrian-S released models & two benchmarks (VSR & VSC) for “spatial supersensing” in video! We found: 1️⃣ Simple no-frame baseline (NoSense) ~perfectly solves VSR! 2️⃣ Tiny sanity check collapses Cambrian-S perf to 0% on VSC! 🧵👇
Vishaal Udandarao tweet media
English
5
23
122
40.1K