HanRong YE @ Nvidia

397 posts

HanRong YE @ Nvidia

@leoyerrrr

Nvidia Research Scientist | Led OmniVinci, MM-Ego, X-VILA..

Santa Clara Katılım Ekim 2012

1.9K Takip Edilen972 Takipçiler

Sabitlenmiş Tweet

HanRong YE @ Nvidia@leoyerrrr·20 Eki

OmniVinci is now #1 paper on Huggingface!!! 🤗 Building omni-modal LLMs is MORE than just mixing tokens 😉 At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems — leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a unified vision–audio alignment module powered by contrastive learning - Temporal Embedding Grouping & Constrained Rotary Time Embedding – enabling absolute and relative temporal representation across multimodal tokens - To support this, we curated a 24M-sample omni-modal dataset and developed a new large-scale data engine for efficient labeling. 🔍 Key Findings: - Audio understanding significantly enhances video comprehension - Audio signals improve omni-modal reinforcement learning - Modality-specific captioning falls short — true understanding demands omni-modal context 📈 Results: OmniVinci-9B outperforms Qwen2.5-Omni across omni-modal, vision, and audio benchmarks — using only 1/6 of the training tokens. #LLMs #AI #ML

English

151

22.3K

HanRong YE @ Nvidia retweetledi

Baifeng@baifeng_shi·24 Mar

Humans can see in high-res, high-FPS in real-time. Why can't VLMs? Introducing AutoGaze: ViTs/VLMs "gaze" only at key video regions! Up to 4-100x token savings, 19x speedup, and enables scaling to 4K-res 1K-frame videos. 📄 arxiv.org/abs/2603.12254 🌐 autogaze.github.io 🤗 huggingface.co/collections/bf… (1/n)🧵

English

199

1.6K

149.7K

HanRong YE @ Nvidia@leoyerrrr·21 Mar

Another powerful open-source Nemotron model! Congrats to the team 🥂

Wenliang Dai@Wenliang_Dai

🚀 Introducing Nemotron-Cascade 2: our new best-in-class 30B-A3B MoE model. 🥇 Gold Medal at IMO 2025, IOI 2025, and the ICPC World Finals. 🔥 Outperforms Qwen3.5-35B-A3B across Math, Code Reasoning, alignment, and instruction following. 🔓 Great reproducibility: Model weights, SFT, and RL data are open! Check out our technical report and huggingface page for more details and insights 👇 📰 Technical Report: t.co/dFC00m6RZU 🤗 Model & Data: t.co/4QJqfTOt6I

English

1.8K

HanRong YE @ Nvidia@leoyerrrr·12 Mar

NVIDIA is the company most driven to advance open-source research! 😃

vLLM@vllm_project

🎉 Congrats to @nvidia on the release of Nemotron 3 Super — day-0 support in vLLM v0.17.1! Verified on NVIDIA GPUs. 120B hybrid MoE, only 12B active at inference. Big upgrades over the previous Nemotron Super: - 5x higher throughput - 2x higher accuracy on Artificial Analysis Intelligence Index - Multi-Token Prediction (MTP) for faster long-form generation - Configurable thinking budget — dial accuracy vs token cost per task - 1M token context window Supports BF16, FP8, and NVFP4. Fully open: weights, datasets, recipes. Blog: vllm.ai/blog/nemotron-… 🤝 Thanks @NVIDIAAIDev Nemotron team and vLLM community contributors!

English

208

HanRong YE @ Nvidia retweetledi

Shizhe Diao@shizhediao·6 Mar

Super excited to see Nemotron-CLIMBMix being used in the nanochat speedrun and making such a big difference 🚀 And yes, we also worried about potential goodharting when we first saw the results 😅 But the clustering + iterative mixing design seems to generalize quite robustly in our experiments. Really grateful to Andrej and the community for trying it out and pushing the idea further.✨

Andrej Karpathy@karpathy

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English

3.8K

HanRong YE @ Nvidia retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·16 Şub

Chinese New Year is rapidly becoming the AI researcher's favorite holiday

English

1.3K

140.7K

HanRong YE @ Nvidia@leoyerrrr·8 Şub

@underwoodxie96 This is visual version of world model

English

1.2K

underwood@underwoodxie96·7 Şub

WTF, I uploaded a screenshot from the One Piece manga and asked Seedance 2.0 to generate a video for me, and it actually worked! prompt： Video generated from reference text, with automatic coloring.

English

646

208.5K

HanRong YE @ Nvidia@leoyerrrr·7 Şub

You get the speed—without sacrificing accuracy. Big congrats to Zhijian’s team for the outstanding work!

Zhijian Liu@zhijianliu_

The paper is now available: huggingface.co/papers/2602.06… More updates coming soon!

English

482

HanRong YE @ Nvidia@leoyerrrr·28 Oca

@wenhaocha1 AI researchers should start to fear AI taking our jobs. 🤣

English

182

Wenhao Chai@wenhaocha1·28 Oca

prism might not only released for human, it's for the future ai "researchers". If AI can independently produce research that meets somehow good standards and also has access to far more compute, then our existing evaluation systems are in a dangerous position. prism.openai.com

English

2.1K

HanRong YE @ Nvidia@leoyerrrr·26 Oca

@goelarushi27 💪💪💪

QME

153

Arushi Goel@goelarushi27·26 Oca

🚀 Triple win! Music Flamingo(huggingface.co/spaces/nvidia/…), OmniVinci(huggingface.co/nvidia/omnivin…) and UALM(research.nvidia.com/labs/adlr/UALM/) all accepted to ICLR! #ICLR2026 #RiodeJaneiro

English

511

HanRong YE @ Nvidia@leoyerrrr·26 Oca

Big wins this weekend—OmniVinci made it to Rio de Janeiro (ICLR 2026), and the Seahawks made it to the Super Bowl! Go Seahawks!🦅

HanRong YE @ Nvidia@leoyerrrr

📈 According to the official test results from OmniVideoBench using our open-sourced model weights, OmniVinci has once again claimed #1 performance in the 7B LLM category! In addition, the model has been out for just two weeks and downloads have already soared past 6K 🌟

English

593

HanRong YE @ Nvidia@leoyerrrr·24 Oca

NVIDIA is a truly fascinating place for AI researchers. Ask about anything—from GPU hardware to LLM architectures—and someone will have a great answer. Even better, people are eager to share their latest results. No wonder NVIDIA is "the" AI company. 😃

English

322

HanRong YE @ Nvidia retweetledi

Pavlo Molchanov@PavloMolchanov·7 Oca

🚀 Top-1 on GAIA agentic benchmark Nemotron-ToolOrchestra just achieved #1 on GAIA — a benchmark focused on real agentic reasoning. The key idea is simple but powerful: · train a small orchestration model that ·· decomposes a task into smaller sub-tasks ·· decides which tools or models to call ·· sequences execution efficiently Not one big model doing everything - but coordination. Project page: research.nvidia.com/labs/lpr/ToolO… Benchmark: huggingface.co/spaces/gaia-be… Details in my previous post: x.com/PavloMolchanov…

English

9.1K

HanRong YE @ Nvidia@leoyerrrr·2 Ara

Catch our NVIDIA Omni LLM team (including me 😉) at #NeurIPS 2025 on Dec 3: 1. OmniVinci Social at the NVIDIA booth: 2:00–3:00 pm and 4:30–5:00 pm 2. NVIDIA Reception at Puesto (789 W Harbor Dr, Unit 155, San Diego): 6:00 pm – 10:00 pm 3. NVIDIA Omni LLMs Meetup (application full)

English

HanRong YE @ Nvidia@leoyerrrr·2 Ara

In our internal evaluation, Nemotron-Flash also surpasses LLAMA 3 in VLM tasks. 🤠

Yonggan Fu@YongganFu

👀Your small LMs (SLMs) are… not that fast? 🚀At NVIDIA Research, we release 𝐍𝐞𝐦𝐨𝐭𝐫𝐨𝐧-𝐅𝐥𝐚𝐬𝐡 (NeurIPS 2025), a hybrid SLM family designed around real-world latency and trained from scratch with 1B/3B sizes, achieving SOTA accuracy, latency, and throughput. 🌟𝐍𝐞𝐦𝐨𝐭𝐫𝐨𝐧-𝐅𝐥𝐚𝐬𝐡 𝐡𝐚𝐬 𝐛𝐞𝐞𝐧 𝐢𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐞𝐝 𝐢𝐧𝐭𝐨 𝐓𝐑𝐓𝐋𝐋𝐌 𝐟𝐨𝐫 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐠𝐫𝐚𝐝𝐞 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 with up to 41K tokens/second on a single H100 GPU! Try it following the instructions in our HF repo. Will share more details at NeurIPS’25 (poster on Thursday, 11am–2pm)! 𝐏𝐚𝐩𝐞𝐫 𝐋𝐢𝐧𝐤: arxiv.org/pdf/2511.18890 🤗 𝐇𝐅 𝐦𝐨𝐝𝐞𝐥𝐬: Nemotron-Flash-1B: huggingface.co/nvidia/Nemotro… Nemotron-Flash-3B: huggingface.co/nvidia/Nemotro… Nemotron-Flash-3B-Instruct: huggingface.co/nvidia/Nemotro…

English

413

HanRong YE @ Nvidia@leoyerrrr·30 Kas

These are the performance you can unlock when a small language model learns to use tools! 🤓

Niels Rogge@NielsRogge

Nvidia silently dropped Orchestrator-8B 👀 “On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.” huggingface.co/nvidia/Orchest…

English

573

HanRong YE @ Nvidia@leoyerrrr·29 Kas

@tydsh +1

Yuandong Tian@tydsh·28 Kas

yeah not a great choice. Demoralizing for students who have spent a lot of efforts to raise the scores ... then AC has a lot of works to do...

Cihang Xie@cihangxie

This decision is deeply concerning and unfair to authors who invested significant effort into the rebuttal process. For example, our students did an amazing job turning a paper with negative reviews into an 8/6/6/6. This policy feels like it just resets all that hard work to zero.

English

23.9K

HanRong YE @ Nvidia@leoyerrrr·28 Kas

@endernewton Which one is better? 👀

English

377

Xinlei Chen@endernewton·27 Kas

Yeah fun time we switched between PyTorch and Jax back then! Btw we still do it (everyday) at xAI lol

Lucas Beyer (bl16)@giffmana

Yep

English

143

37.2K

HanRong YE @ Nvidia@leoyerrrr·27 Kas

This leak will discourage serious researchers from agreeing to review: the workload is heavy, the responsibility is substantial, and there’s no real reward. As a result, the quality of papers today increasingly depends on the authors’ own ethical standards.

Matthias Niessner@MattNiessner

Tempted to look up reviewer names? Don’t. As an author, you simply cannot emotionally detach your work from the person behind the review. This results in grudges that might last decades. I assure you, if someone disliked your paper, you do *not* want to know who it was. Unfortunately, this makes this leak a major issue as the research community relies on the double-blind review process in order to provide objective and fair evaluations.

English

1.3K

HanRong YE @ Nvidia retweetledi

Shizhe Diao@shizhediao·27 Kas

🚀 Excited to share ToolOrchestra, an end-to-end RL training framework for orchestrating tools and agentic workflows. Everyone’s building agent workflows these days — connecting tools, APIs, and LLMs like LEGO. 🧩 But here are our findings: 👉 Just prompting the agent workflow won’t cut it. It’s not how you build the best agent. 👉 Without learning, workflows plateau fast. It’s time to bring RL fine-tuning 🔥back into agent development. (1/n)

English

347

67.4K

HanRong YE @ Nvidia@leoyerrrr·25 Kas

@vishaal_urao Many benchmarks can not survive these tests😅

English

354

Vishaal Udandarao@vishaal_urao·24 Kas

🚀 New paper! arxiv.org/abs/2511.16655 Recently, Cambrian-S released models & two benchmarks (VSR & VSC) for “spatial supersensing” in video! We found: 1️⃣ Simple no-frame baseline (NoSense) ~perfectly solves VSR! 2️⃣ Tiny sanity check collapses Cambrian-S perf to 0% on VSC! 🧵👇

English

122

40.1K

Keşfet

@underwoodxie96 @wenhaocha1 @goelarushi27 @tydsh @elonmusk @BarackObama @taylorswift13 @cristiano