Ziteng Sun

83 posts

Ziteng Sun

Ziteng Sun

@SZiteng

Responsible and efficient AI. Topics: LLM efficiency; LLM alignment; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell

NYC Katılım Şubat 2015
412 Takip Edilen638 Takipçiler
Ziteng Sun retweetledi
Jeff Dean
Jeff Dean@JeffDean·
⚡ Excited to announce Gemini 3.1 Flash-Lite! We’ve set a new standard for efficiency and capability to give developers our fastest, most cost-effective Gemini 3 model yet. We engineered this model with thinking levels, allowing it to handle high-volume queries instantly, while scaling up its reasoning for complex edge cases. By the numbers: ⏱️ 2.5X faster time-to-first-token than 2.5 Flash while being significantly higher quality 📉 $0.25 per 1M input tokens 📊 1432 Elo on LMArena & 86.9% on GPQA Diamond Thrilled to see what developers build with this kind of speed and quality at scale. Available now in Google AI Studio and Vertex AI. blog.google/innovation-and…
Jeff Dean tweet media
English
69
122
1.3K
117.8K
Ziteng Sun retweetledi
Mert Cemri
Mert Cemri@mertcemri·
Introducing SPECS (SPECulative test time Scaling), a test-time scaling (TTS) algorithm with pareto-frontier latency/accuracy trade-off. Scaling test-time compute improves LLM reasoning but imposes a latency overhead. Prior work optimizes TTS accuracy as a function of FLOPS, we propose to further reduce latency by addressing the memory bottleneck of LLM inference through speculative drafts. See a breakdown of the method below. (1/n) 🧵 👇
Mert Cemri tweet media
English
5
24
108
21.6K
Ziteng Sun
Ziteng Sun@SZiteng·
I will be at NeurIPS from today to Dec. 7th. Excited to meet old and new friends at the conference. Happy to chat about anything related to LLM efficiency, RL, and differential privacy. #NeurIPS2025 At the Wednesday noon session (11 AM – 2PM), I will be presenting our spotlight work: Private Set Union with Multiple Contributions (#1314), where we establish fundamental limits on the utility of discovering set unions privately, and how we can bypass the limit by leveraging a prediction. Joint work with awesome collaborators at Google Research: Travis Dick, Haim Kaplan, Alex Kulesza, Uri Stemmer, and @th33rtha.
Ziteng Sun tweet media
English
0
0
7
508
Ziteng Sun retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
English
212
1.1K
6.5K
1.7M
Ziteng Sun retweetledi
Ahmad Beirami
Ahmad Beirami@abeirami·
The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prompt. Let me elaborate on what I mean by that and a cheaper way of doing it offline.
Ahmad Beirami tweet media
English
11
53
658
117.6K
Ziteng Sun retweetledi
Ahmad Beirami
Ahmad Beirami@abeirami·
Happening now at poster E-2804. Come talk to us about why reward calibration key is to alignment and how to do RLHF for test-time scaling
Ahmad Beirami tweet media
Ahmad Beirami@abeirami

[Thu Jul 17] w/ @ananthbshankar & @jacobeisenstein, we present a reinforcement learning framework in view of test-time scaling. We show how to optimally calibrate & transform rewards to obtain optimal performance with a given test-time algorithm. x.com/SZiteng/status…

English
1
2
20
3.5K
Ziteng Sun
Ziteng Sun@SZiteng·
[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with @ananthbshankar, @abeirami, @jacobeisenstein, and myself.
Ziteng Sun tweet media
English
1
3
14
1.3K
Banghua Zhu
Banghua Zhu@BanghuaZ·
Excited to share that I’m joining NVIDIA as a Principal Research Scientist! We’ll be joining forces on efforts in model post-training, evaluation, agents, and building better AI infrastructure—with a strong emphasis on collaboration with developers and academia. We’re committed to open-sourcing our work and sharing it with the world. Let’s build a stronger, more open AI community together!
Banghua Zhu tweet media
English
141
96
2.5K
249.7K
Zhaolin Gao
Zhaolin Gao@GaoZhaolin·
Current RLVR methods like GRPO and PPO require explicit critics or multiple generations per prompt, resulting in high computational and memory costs. We introduce ⭐A*-PO, a policy optimization algorithm that uses only a single sample per prompt during online RL without critic.
Zhaolin Gao tweet media
English
7
29
224
29.9K
Ziteng Sun
Ziteng Sun@SZiteng·
@abeirami Congratulations on all the amazing achievements. I am super grateful for the opportunity to be part of the journey and learn from you. Looking forward to your amazing achievements to come.
English
1
0
6
749
Ahmad Beirami
Ahmad Beirami@abeirami·
After three incredible years, today is my last day at Google DeepMind! I am truly grateful to the amazing colleagues who made the journey 1000x more fruitful and enjoyable! I am forever indebted to my collaborators who showed me how to be better at everything via demonstrations.
Ahmad Beirami tweet media
English
38
13
757
85.4K
Ziteng Sun retweetledi
Ziteng Sun
Ziteng Sun@SZiteng·
Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time procedure? We answer this affirmatively, check out the thread below.
Ziteng Sun tweet media
English
5
50
258
67.5K
Ziteng Sun retweetledi
Nived Rajaraman
Nived Rajaraman@Nived_Rajaraman·
Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025
Nived Rajaraman tweet media
English
1
29
86
33.5K
Hongyang Zhang
Hongyang Zhang@hongyangzh·
@SZiteng Yes, we have studied this case. This is the experiment done by the SGLang team @ispobaoke in the SGLang environment. The draft length is 3 and there is no draft tree.
Hongyang Zhang tweet media
English
1
0
2
131
Hongyang Zhang
Hongyang Zhang@hongyangzh·
Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration! - 5x🚀than vanilla (on HF) - 1.4x🚀than EAGLE-2 (on HF) - A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang) - 1.65x🚀in latency even for large bs=64 (on SGLang) - A new scaling law: more training data, better speedup - Apache 2.0 Paper: arxiv.org/abs/2503.01840 Code: github.com/SafeAILab/EAGLE SGLang version: github.com/sgl-project/sg… ⚒️Takeaway: Introducing training-time test, a novel draft model training technique: we replace feature prediction with direct token prediction and shift from top-layer-only features to multi-layer feature fusion. This approach unlocks a new scaling law previously undiscovered in EAGLE and EAGLE-2. 🙏Acknowledge: We would like to thank the SGLang team (@zhyncs42 @lm_zheng @ying11231 @JamesLiuID, @ispobaoke, and others @lmsysorg) for their merge and careful evaluation of EAGLE-3 on SGLang. 🤝Want to collaborate? We're a small academic group with limited GPU resources. If you're interested in supporting our next version of EAGLE or would like us to train a preliminary version tailored to a specific model, please get in touch! Joint work with Yuhui Li, Fangyun Wei, and Chao Zhang
English
14
43
298
41.2K