Yi Dong

21 posts

Yi Dong

Yi Dong

@doyend

Katılım Haziran 2009
65 Takip Edilen62 Takipçiler
Yi Dong retweetledi
Hao Zhang
Hao Zhang@HaoZhang3438830·
Excited to introduce ProRL Agent: Rollout-as-a-Service for RL training of multi-turn LLM agents! 🚀 As we move toward complex agentic tasks, rollout infrastructure is often a bottleneck. We’re decoupling I/O-heavy rollouts from GPU training via a unified HTTP API. Why ProRL Agent? Decoupled & Scalable: Treats rollout as a service, allowing near-linear throughput scaling. System-Level Optimization: Includes load balancing and automated sandbox cleanup for high stability. Integrated: Now part of NVIDIA NeMo Gym to help researchers scale RL pipelines faster. The Results 📈 On SWE-bench-Verified, we saw significant gains: +8.4 on Qwen3-8B +8.2 on Qwen3-14B Proven success across STEM, Math, and General Coding agents. Check out the research and open-source code: 📄 Paper: arxiv.org/pdf/2603.18815💻 Repo: github.com/NVIDIA-NeMo/Pr… Huge thanks to the team and NVIDIA for the support! 👏
Hao Zhang tweet media
English
4
20
136
27.5K
Yi Dong retweetledi
Ximing Lu
Ximing Lu@GXiming·
We’re open-sourcing the data and model behind Golden Goose 🦢✨. Check them out and see how we turn unverifiable internet text 🌐 into large-scale RLVR tasks 😎. 📊 GooseReason-0.7M: huggingface.co/datasets/nvidi… 🤖 GooseReason-4B-Instruct: huggingface.co/nvidia/Nemotro…
Ximing Lu@GXiming

There’s growing excitement around scaling up RLVR to get continuous gains with more compute. But in practice, improvements saturate on finite training data. 😱 Introducing Golden Goose 🦢✨, a simple trick to synthesize unlimited RLVR tasks 😎 from unverifiable internet text. 🌐

English
3
34
266
34K
Yi Dong retweetledi
Ximing Lu
Ximing Lu@GXiming·
There’s growing excitement around scaling up RLVR to get continuous gains with more compute. But in practice, improvements saturate on finite training data. 😱 Introducing Golden Goose 🦢✨, a simple trick to synthesize unlimited RLVR tasks 😎 from unverifiable internet text. 🌐
Ximing Lu tweet media
English
13
66
394
107.9K
Yi Dong retweetledi
Zhilin Wang
Zhilin Wang@wangzhilin123·
You asked and we listened The @nvidia ProfBench leaderboard 🏆 is here on @huggingface : huggingface.co/spaces/nvidia/… One design we have for the leaderboard is that we distinguish open-weight vs closed-source models and reasoning vs instruct model. Separately, we also show the cost of running the entire benchmark (thanks to @openrouter for putting prices in one place) because real world users absolutely care about prices. Putting this together with @viviennezhangx, we were surprised to find that open-weight models can sometimes perform at a similar level to closed-source models but at cents on the dollar. 🤑 Thanks @ClementDelangue @imohitmayank for the amazing suggestion! What models do you want to see on there next? Comment below and I’ll run it (nothing crazy though) #ProfBench #LLM #AIevaluation #NeMo #NVIDIA #OpenSourceAI #AIresearch #AgenticAI #GenerativeAI #BuiltByExperts #GTCDC
Zhilin Wang@wangzhilin123

We built ProfBench to raise the bar for LLMs - literally. At @NVIDIA, we worked with domain experts to create a benchmark that goes far beyond trivia and short answers. ProfBench tests LLMs on complex, multi-step tasks that demand the kind of reasoning, synthesis, and clarity you'd expect from a PhD physicist or MBA consultant. 🌎 This isn’t just a dataset drop. It’s a global collaboration: 38 professionals across 8 countries contributed over 7,000 expert-written rubrics across finance MBA 💵, consulting MBA 📊, chemistry PhD 🧪and physics PhD 🚀. 🧗Every prompt and grading rubric was handcrafted, requiring tens of hours of dedicated and focussed work. Now fully supported in the NeMo Evaluator SDK, ProfBench enables reproducible, rubric-based evaluations and side-by-side model comparisons. 🔗 ProfBench on @HuggingFace huggingface.co/datasets/nvidi… 🔗 NeMo Evaluator SDK github.com/NVIDIA-NeMo/Ev… I’m so proud of the team that made this happen. Let’s keep pushing what AI can do. Work done with @jaehunjung_com @GXiming @shizhediao Ellie Evans @jiaqizengggggg @PavloMolchanov @YejinChoinka @jankautz @doyend #ProfBench #LLM #AIevaluation #NeMo #NVIDIA #OpenSourceAI #AIresearch #AgenticAI #GenerativeAI #BuiltByExperts #GTCDC

English
0
3
6
1.5K
Yi Dong retweetledi
Zhilin Wang
Zhilin Wang@wangzhilin123·
We built ProfBench to raise the bar for LLMs - literally. At @NVIDIA, we worked with domain experts to create a benchmark that goes far beyond trivia and short answers. ProfBench tests LLMs on complex, multi-step tasks that demand the kind of reasoning, synthesis, and clarity you'd expect from a PhD physicist or MBA consultant. 🌎 This isn’t just a dataset drop. It’s a global collaboration: 38 professionals across 8 countries contributed over 7,000 expert-written rubrics across finance MBA 💵, consulting MBA 📊, chemistry PhD 🧪and physics PhD 🚀. 🧗Every prompt and grading rubric was handcrafted, requiring tens of hours of dedicated and focussed work. Now fully supported in the NeMo Evaluator SDK, ProfBench enables reproducible, rubric-based evaluations and side-by-side model comparisons. 🔗 ProfBench on @HuggingFace huggingface.co/datasets/nvidi… 🔗 NeMo Evaluator SDK github.com/NVIDIA-NeMo/Ev… I’m so proud of the team that made this happen. Let’s keep pushing what AI can do. Work done with @jaehunjung_com @GXiming @shizhediao Ellie Evans @jiaqizengggggg @PavloMolchanov @YejinChoinka @jankautz @doyend #ProfBench #LLM #AIevaluation #NeMo #NVIDIA #OpenSourceAI #AIresearch #AgenticAI #GenerativeAI #BuiltByExperts #GTCDC
English
3
15
84
52K
Yi Dong retweetledi
Shizhe Diao
Shizhe Diao@shizhediao·
🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration When step-scaling hits a plateau, scale rollouts, not steps. BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts. 👇 (1/n)
Shizhe Diao tweet media
English
18
44
202
44.4K
Yi Dong retweetledi
Shizhe Diao
Shizhe Diao@shizhediao·
Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering new insights into the debate.
Shizhe Diao tweet media
English
19
68
423
80.4K
Yi Dong retweetledi
Oliver Stanley
Oliver Stanley@_OliverStanley·
Introducing Reasoning Gym: Over 100 procedurally generated reasoning environments for evaluation and RLVR of language models. Generate virtually infinite training or evaluation data with fine-grained difficulty control and automatic verifiers. 🧵 1/
Oliver Stanley tweet media
English
3
42
274
44.9K
Yi Dong retweetledi
Jousef Murad
Jousef Murad@Jousefm2·
⚡2D to Simulate 3D: Made that legendary Rubik's Cube even easier to Understand ⚡ The legendary Rubik's Cube made even easier to understand
English
36
980
4.9K
0
Yi Dong retweetledi
FeatureX
FeatureX@featurexai·
We're live and ready at #nips2016! Come check out our booth for #swag and the chance to win a #drone!
FeatureX tweet media
English
0
3
2
0