Harry Dong (@Real_HDong) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Harry Dong@Real_HDong·3 Eki

1/🧵 🎉Introducing Bridge🌉, our parallel LLM inference scaling method that shares info between all responses to an input prompt throughout the generation process! Bridge greatly improves the quality of individual responses and the entire response set! 📜arxiv.org/pdf/2510.01143

English

1

4

24

4.6K

Harry Dong retweetledi

Infini-AI-Lab@InfiniAILab·18 Şub

Video generation models are improving fast—real-time autoregressive models now deliver high quality at low latency, and they’re quickly being adopted for world models and robotics applications. So what’s the problem? They’re still too slow on consumer hardware. 🚀 What if we told you that we can get true real-time 16 FPS video generation on a single RTX 5090? (1.5-12x over FA 2/3/4 on 5090, H100, B200) Today we release MonarchRT 🦋, an efficient video attention that parameterizes attention maps as (tiled) Monarch matrices and delivers real E2E gains. 📄 Paper: arxiv.org/abs/2602.12271 🌐 Website: infini-ai-lab.github.io/MonarchRT 🔗 GitHub: github.com/Infini-AI-Lab/… 🧵1/n

English

4

27

132

32.9K

Harry Dong retweetledi

Infini-AI-Lab@InfiniAILab·10 Şub

RL is notoriously unstable under actor–policy mismatch 😥 — a common reality caused by kernel differences, MoE randomness, FP8 rollouts, or asynchronous pipelines. But here’s a crazy thought 🤔 👉 What if you could RL-train a large model using rollouts generated only by a weaker, faster, and completely different model? Sounds doomed from the start? 💩 We are releasing Jackpot 🎰.💡 enabling training Qwen3-8B-Base using only Qwen3-1.7B-Base generated rollouts ✨ Jackpot is surprisingly powerful: • Enables cheap, fast rollouts to train stronger models • Dramatically changes the cost–performance tradeoff of RL training We release Jackpot 🎰 in the following format: 🌔Paper: arxiv.org/abs/2602.06107 🌕Code: github.com/Infini-AI-Lab/… 🌖Blog: infini-ai-lab.github.io/jpt_website/ [1/n]

English

6

22

124

23.5K

Harry Dong@Real_HDong·22 Oca

Very neat work led by @RJ_Sadhukhan to make LLMs more efficient, sparse, and interpretable!

Infini-AI-Lab@InfiniAILab

Lookup memories are having a moment 😄 The whale 🐋 #deepseek dropped engram… and we dropped up-projections from our FFNs…perfect timing 😅 🥳 Introducing STEM: Scaling Transformers with Embedding Modules 🌱 A scalable way to boost parametric memory with extra perks: ✅ Stable training even at extreme sparsity ✅ Better quality for fewer training FLOPs (knowledge + reasoning + long-context gains) ✅ Efficient inference: ~33% FFN params removed + CPU offload & async prefetch ✅ More interpretable → seamless knowledge editing 🔧🧠 Looking forward to DeepSeek v4… feels like we’ve only scratched the surface of embedding-lookup scaling 👀 📄Paper: arxiv.org/abs/2601.10639 🌐 Website: infini-ai-lab.github.io/STEM 🔗 GitHub: github.com/Infini-AI-Lab/…

English

0

5

170

Harry Dong retweetledi

Lucas Beyer (bl16)@giffmana·17 Oca

How timely!

Rosinality@rosinality

Replacing the up projection in FFNs with token embeddings. As cited in the paper it feels similar to a hash based router for MoEs. Again Engram could be the beginning of studies around embeddings.

English

4

22

272

35.7K

Harry Dong@Real_HDong·2 Ara

At NeurIPS all week! Swing by the Efficient Reasoning workshop at 10:45-11:00 on Saturday to hear my oral presentation about our work on interdependent sampling for parallel generation!

Harry Dong@Real_HDong

1/🧵 🎉Introducing Bridge🌉, our parallel LLM inference scaling method that shares info between all responses to an input prompt throughout the generation process! Bridge greatly improves the quality of individual responses and the entire response set! 📜arxiv.org/pdf/2510.01143

English

0

1

6

306

Harry Dong retweetledi

Yuejie Chi@yuejiec·8 Kas

This will be presented as an oral talk at NeurIPS Workshop Efficient Reasoning! Stop by to listen to Harry if you plan to attend NeurIPS. Harry is also on the job market this year! @Real_HDong

Harry Dong@Real_HDong

1/🧵 🎉Introducing Bridge🌉, our parallel LLM inference scaling method that shares info between all responses to an input prompt throughout the generation process! Bridge greatly improves the quality of individual responses and the entire response set! 📜arxiv.org/pdf/2510.01143

English

0

2

5

1K

Harry Dong retweetledi

Rohan Choudhury@rchoudhury997·23 Eki

Excited to release our new preprint - we introduce Adaptive Patch Transformers (APT), a method to speed up vision transformers by using multiple different patch sizes within the same image!

English

10

28

232

29.7K

Harry Dong retweetledi

David Brandfonbrener@brandfonbrener·16 Eki

New paper! I had a lot of fun working on scaling up RL with @agarwl_ @Devvrit_Khatri @louvishh and others. Check it out!

Devvrit@Devvrit_Khatri

Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs

English

3

6

69

11.8K

Harry Dong retweetledi

Infini-AI-Lab@InfiniAILab·7 Eki

🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and performant even when using data stale by 256 model updates. 🔗 Notion Blog: m2po.notion.site/rl-stale-m2po 📄 Paper: arxiv.org/abs/2510.01161 💻 GitHub: github.com/Infini-AI-Lab/… 🧵 1/4

English

3

39

233

62.6K

Harry Dong@Real_HDong·3 Eki

9/🧵 More details in the full paper (Generalized Parallel Scaling with Interdependent Generations): arxiv.org/pdf/2510.01143 Big thank you to all my collaborators @brandfonbrener, @ErykHelenowski, Yun He, Mrinal Kumar, @Han_Fang_, @yuejiec, @karthikabinav

English

0

4

180

Harry Dong@Real_HDong·3 Eki

8/🧵 ✨Key takeaway: By treating LLM features for parallel scaling as a single tensor unit instead of independent slices, each response can give/take info to/from other responses to improve individual response AND response set quality while maintaining total parallelism.

English

1

0

135

Harry Dong@Real_HDong·3 Eki

1/🧵 🎉Introducing Bridge🌉, our parallel LLM inference scaling method that shares info between all responses to an input prompt throughout the generation process! Bridge greatly improves the quality of individual responses and the entire response set! 📜arxiv.org/pdf/2510.01143

English

1

4

24

4.6K

Harry Dong retweetledi

Infini-AI-Lab@InfiniAILab·17 Haz

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

GIF

English

6

76

221

120.5K

Harry Dong retweetledi

Infini-AI-Lab@InfiniAILab·14 Şub

🚀 RAG vs. Long-Context LLMs: The Real Battle ⚔️ 🤯Turns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. 🤡So, do we even need long-context models? YES. Because today’s benchmarks are flawed: ⛳ Too Simple – Over-reliant on retrieval & QA. ⛳ Detectable Noise – RAG can filter out filler text easily. ⛳ Too Few – High-quality data requires huge human effort. 🔭 With LC LLMs hitting the ceiling, we need a benchmark that justifies their insane training costs. 🔥 Introducing 🐭🐷 GSM-Infinite – our synthetic long-context reasoning benchmark built to push LLMs to their real limits. 💎 Infinitely scalable in reasoning complexity & quantity 💎 Precision control over reasoning complexity 💎 Fully customizable RAG-proof context lengths 🚀 [1/n] 📄Paper: arxiv.org/abs/2502.05252 🖥️Code: github.com/Infini-AI-Lab/… 🤗Huggingface datasets: huggingface.co/collections/In… 🏃Leaderboard: huggingface.co/spaces/InfiniA…

English

6

37

188

98.4K

Harry Dong

Keşfet