Yi Dong (@doyend) - Twitter Profili | Zamantika Mersobahis Locabet

Yi Dong retweetledi

Hao Zhang@HaoZhang3438830·28 Mar

Excited to introduce ProRL Agent: Rollout-as-a-Service for RL training of multi-turn LLM agents! 🚀 As we move toward complex agentic tasks, rollout infrastructure is often a bottleneck. We’re decoupling I/O-heavy rollouts from GPU training via a unified HTTP API. Why ProRL Agent? Decoupled & Scalable: Treats rollout as a service, allowing near-linear throughput scaling. System-Level Optimization: Includes load balancing and automated sandbox cleanup for high stability. Integrated: Now part of NVIDIA NeMo Gym to help researchers scale RL pipelines faster. The Results 📈 On SWE-bench-Verified, we saw significant gains: +8.4 on Qwen3-8B +8.2 on Qwen3-14B Proven success across STEM, Math, and General Coding agents. Check out the research and open-source code: 📄 Paper: arxiv.org/pdf/2603.18815💻 Repo: github.com/NVIDIA-NeMo/Pr… Huge thanks to the team and NVIDIA for the support! 👏

English

4

20

136

27.5K

Yi Dong retweetledi

Ximing Lu@GXiming·2 Mar

We’re open-sourcing the data and model behind Golden Goose 🦢✨. Check them out and see how we turn unverifiable internet text 🌐 into large-scale RLVR tasks 😎. 📊 GooseReason-0.7M: huggingface.co/datasets/nvidi… 🤖 GooseReason-4B-Instruct: huggingface.co/nvidia/Nemotro…

Ximing Lu@GXiming

There’s growing excitement around scaling up RLVR to get continuous gains with more compute. But in practice, improvements saturate on finite training data. 😱 Introducing Golden Goose 🦢✨, a simple trick to synthesize unlimited RLVR tasks 😎 from unverifiable internet text. 🌐

English

3

34

266

34K

Yi Dong retweetledi

Ximing Lu@GXiming·3 Şub

There’s growing excitement around scaling up RLVR to get continuous gains with more compute. But in practice, improvements saturate on finite training data. 😱 Introducing Golden Goose 🦢✨, a simple trick to synthesize unlimited RLVR tasks 😎 from unverifiable internet text. 🌐

English

13

66

394

107.9K

Yi Dong retweetledi

Zhilin Wang@wangzhilin123·30 Eki

You asked and we listened The @nvidia ProfBench leaderboard 🏆 is here on @huggingface : huggingface.co/spaces/nvidia/… One design we have for the leaderboard is that we distinguish open-weight vs closed-source models and reasoning vs instruct model. Separately, we also show the cost of running the entire benchmark (thanks to @openrouter for putting prices in one place) because real world users absolutely care about prices. Putting this together with @viviennezhangx, we were surprised to find that open-weight models can sometimes perform at a similar level to closed-source models but at cents on the dollar. 🤑 Thanks @ClementDelangue @imohitmayank for the amazing suggestion! What models do you want to see on there next? Comment below and I’ll run it (nothing crazy though) #ProfBench #LLM #AIevaluation #NeMo #NVIDIA #OpenSourceAI #AIresearch #AgenticAI #GenerativeAI #BuiltByExperts #GTCDC

Zhilin Wang@wangzhilin123

We built ProfBench to raise the bar for LLMs - literally. At @NVIDIA, we worked with domain experts to create a benchmark that goes far beyond trivia and short answers. ProfBench tests LLMs on complex, multi-step tasks that demand the kind of reasoning, synthesis, and clarity you'd expect from a PhD physicist or MBA consultant. 🌎 This isn’t just a dataset drop. It’s a global collaboration: 38 professionals across 8 countries contributed over 7,000 expert-written rubrics across finance MBA 💵, consulting MBA 📊, chemistry PhD 🧪and physics PhD 🚀. 🧗Every prompt and grading rubric was handcrafted, requiring tens of hours of dedicated and focussed work. Now fully supported in the NeMo Evaluator SDK, ProfBench enables reproducible, rubric-based evaluations and side-by-side model comparisons. 🔗 ProfBench on @HuggingFace huggingface.co/datasets/nvidi… 🔗 NeMo Evaluator SDK github.com/NVIDIA-NeMo/Ev… I’m so proud of the team that made this happen. Let’s keep pushing what AI can do. Work done with @jaehunjung_com @GXiming @shizhediao Ellie Evans @jiaqizengggggg @PavloMolchanov @YejinChoinka @jankautz @doyend #ProfBench #LLM #AIevaluation #NeMo #NVIDIA #OpenSourceAI #AIresearch #AgenticAI #GenerativeAI #BuiltByExperts #GTCDC

English

0

3

6

1.5K

Yi Dong retweetledi

Zhilin Wang@wangzhilin123·28 Eki

We built ProfBench to raise the bar for LLMs - literally. At @NVIDIA, we worked with domain experts to create a benchmark that goes far beyond trivia and short answers. ProfBench tests LLMs on complex, multi-step tasks that demand the kind of reasoning, synthesis, and clarity you'd expect from a PhD physicist or MBA consultant. 🌎 This isn’t just a dataset drop. It’s a global collaboration: 38 professionals across 8 countries contributed over 7,000 expert-written rubrics across finance MBA 💵, consulting MBA 📊, chemistry PhD 🧪and physics PhD 🚀. 🧗Every prompt and grading rubric was handcrafted, requiring tens of hours of dedicated and focussed work. Now fully supported in the NeMo Evaluator SDK, ProfBench enables reproducible, rubric-based evaluations and side-by-side model comparisons. 🔗 ProfBench on @HuggingFace huggingface.co/datasets/nvidi… 🔗 NeMo Evaluator SDK github.com/NVIDIA-NeMo/Ev… I’m so proud of the team that made this happen. Let’s keep pushing what AI can do. Work done with @jaehunjung_com @GXiming @shizhediao Ellie Evans @jiaqizengggggg @PavloMolchanov @YejinChoinka @jankautz @doyend #ProfBench #LLM #AIevaluation #NeMo #NVIDIA #OpenSourceAI #AIresearch #AgenticAI #GenerativeAI #BuiltByExperts #GTCDC

English

3

15

84

52K

Yi Dong retweetledi

Shizhe Diao@shizhediao·7 Eki

🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration When step-scaling hits a plateau, scale rollouts, not steps. BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts. 👇 (1/n)

English

18

44

202

44.4K

Yi Dong retweetledi

Shizhe Diao@shizhediao·2 Haz

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering new insights into the debate.

English

19

68

423

80.4K

Yi Dong retweetledi

Oliver Stanley@_OliverStanley·2 Haz

Introducing Reasoning Gym: Over 100 procedurally generated reasoning environments for evaluation and RLVR of language models. Generate virtually infinite training or evaluation data with fine-grained difficulty control and automatic verifiers. 🧵 1/

English

3

42

274

44.9K

Yi Dong@doyend·16 Mar

linkedin.com/feed/update/ur…

ZXX

0

27

Yi Dong retweetledi

Jousef Murad@Jousefm2·21 Kas

⚡2D to Simulate 3D: Made that legendary Rubik's Cube even easier to Understand ⚡ The legendary Rubik's Cube made even easier to understand

English

36

980

4.9K

0

Yi Dong@doyend·5 Tem

I really like the muTransfer paper(microsoft.com/en-us/research…). To help me understand the paper better, I wrote a blog to derive some of the missing equations in the paper. yidong72.github.io/mu-transfer-pa… Thank you @TheGregYang for the wonderful theoretical work!

English

0

Yi Dong@doyend·16 Ara

Good explanation of ReBel paper

Noam Brown@polynoamial

I just watched this video and was super impressed by how well @ykilcher communicated the essence of our paper. If you want to understand why AlphaZero can't play poker and why ReBeL can, this is a great video to watch!

English

0

1

0

Yi Dong retweetledi

RAPIDS AI@RAPIDSai·8 Eki

Learn how to achieve a 100x speedup using @numba_jit and @rapidsai for efficient and fast fractional differencing computation on #GPUs. medium.com/rapids-ai/fast…

English

0

13

17

0

Yi Dong@doyend·17 Tem

Happy to take any questions for this work

RAPIDS AI@RAPIDSai

Learn how you can achieve up to 20x speedup in your Quant workflow by leveraging #gQuant, a set of finance examples built on RAPIDS. nvda.ws/2ll11qW

English

0

1

0

Yi Dong retweetledi

NVIDIA AI Developer@NVIDIAAIDev·17 Tem

To help researchers and data scientists in #finance accelerate their workflows with @rapidsai, we've published a new technical post highlighting a few #gQuant finance examples demonstrating the value of GPU accelerated #datascience: nvda.ws/2ldXDxT