Ziteng Sun

83 posts

Ziteng Sun

@SZiteng

Responsible and efficient AI. Topics: LLM efficiency; LLM alignment; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell

NYC Katılım Şubat 2015

412 Takip Edilen638 Takipçiler

Ziteng Sun retweetledi

Jeff Dean@JeffDean·3 Mar

⚡ Excited to announce Gemini 3.1 Flash-Lite! We’ve set a new standard for efficiency and capability to give developers our fastest, most cost-effective Gemini 3 model yet. We engineered this model with thinking levels, allowing it to handle high-volume queries instantly, while scaling up its reasoning for complex edge cases. By the numbers: ⏱️ 2.5X faster time-to-first-token than 2.5 Flash while being significantly higher quality 📉 $0.25 per 1M input tokens 📊 1432 Elo on LMArena & 86.9% on GPQA Diamond Thrilled to see what developers build with this kind of speed and quality at scale. Available now in Google AI Studio and Vertex AI. blog.google/innovation-and…

English

122

1.3K

117.8K

Ziteng Sun retweetledi

Mert Cemri@mertcemri·2 Mar

Introducing SPECS (SPECulative test time Scaling), a test-time scaling (TTS) algorithm with pareto-frontier latency/accuracy trade-off. Scaling test-time compute improves LLM reasoning but imposes a latency overhead. Prior work optimizes TTS accuracy as a function of FLOPS, we propose to further reduce latency by addressing the memory bottleneck of LLM inference through speculative drafts. See a breakdown of the method below. (1/n) 🧵 👇

English

108

21.6K

Ziteng Sun@SZiteng·28 Şub

Our team at Google research is hiring a summer student researcher. DM Asher if you are interested.

Asher Trockman@ashertrockman

I'm hiring a student researcher to work on RL and RLM-flavored things. DM me if interested

English

186

29K

Ziteng Sun retweetledi

Asher Trockman@ashertrockman·23 Şub

I'm hiring a student researcher to work on RL and RLM-flavored things. DM me if interested

Aman@Amank1412

Google Student Researcher Program 2026 is now OPEN! Work on REAL AI/ML projects with: • Google Research • DeepMind • Google Cloud Open to: Bachelors / Masters / PhD Duration: 3–12 months Deadline: March 31 If you're serious about AI, this is your shot. Apply here google.com/about/careers/…

English

565

126.1K

Ziteng Sun@SZiteng·3 Ara

I will be at NeurIPS from today to Dec. 7th. Excited to meet old and new friends at the conference. Happy to chat about anything related to LLM efficiency, RL, and differential privacy. #NeurIPS2025 At the Wednesday noon session (11 AM – 2PM), I will be presenting our spotlight work: Private Set Union with Multiple Contributions (#1314), where we establish fundamental limits on the utility of discovering set unions privately, and how we can bypass the limit by leveraging a prediction. Joint work with awesome collaborators at Google Research: Travis Dick, Haim Kaplan, Alex Kulesza, Uri Stemmer, and @th33rtha.

English

508

Ziteng Sun retweetledi

Google DeepMind@GoogleDeepMind·18 Kas

This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵

English

212

1.1K

6.5K

1.7M

Ziteng Sun retweetledi

Ahmad Beirami@abeirami·9 Ağu

The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prompt. Let me elaborate on what I mean by that and a cheaper way of doing it offline.

English

658

117.6K

Ziteng Sun retweetledi

Ahmad Beirami@abeirami·17 Tem

Happening now at poster E-2804. Come talk to us about why reward calibration key is to alignment and how to do RLHF for test-time scaling

Ahmad Beirami@abeirami

[Thu Jul 17] w/ @ananthbshankar & @jacobeisenstein, we present a reinforcement learning framework in view of test-time scaling. We show how to optimally calibrate & transform rewards to obtain optimal performance with a given test-time algorithm. x.com/SZiteng/status…

English

3.5K

Ziteng Sun@SZiteng·17 Tem

Paper link: arxiv.org/abs/2412.19792.

English

Ziteng Sun@SZiteng·17 Tem

Check out this thread for a short intro: x.com/SZiteng/status…

Ziteng Sun@SZiteng

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time procedure? We answer this affirmatively, check out the thread below.

English

191

Ziteng Sun@SZiteng·17 Tem

[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with @ananthbshankar, @abeirami, @jacobeisenstein, and myself.

English

1.3K

Ziteng Sun@SZiteng·28 Haz

@BanghuaZ Congrats, Banghua!

Indonesia

719

Banghua Zhu@BanghuaZ·27 Haz

Excited to share that I’m joining NVIDIA as a Principal Research Scientist! We’ll be joining forces on efforts in model post-training, evaluation, agents, and building better AI infrastructure—with a strong emphasis on collaboration with developers and academia. We’re committed to open-sourcing our work and sharing it with the world. Let’s build a stronger, more open AI community together!

English

141

2.5K

249.7K

Ziteng Sun@SZiteng·12 Haz

@GaoZhaolin Nice work! We used multiple offline roll-outs for reward calibration when studying inference-aware RLHF. We had the observation that it helped for vanilla RLHF as well. Might be of interest. x.com/SZiteng/status…

Ziteng Sun@SZiteng

English

298

Zhaolin Gao@GaoZhaolin·11 Haz

Current RLVR methods like GRPO and PPO require explicit critics or multiple generations per prompt, resulting in high computational and memory costs. We introduce ⭐A*-PO, a policy optimization algorithm that uses only a single sample per prompt during online RL without critic.

English

224

29.9K

Ziteng Sun@SZiteng·3 Haz

@abeirami Congratulations on all the amazing achievements. I am super grateful for the opportunity to be part of the journey and learn from you. Looking forward to your amazing achievements to come.

English

749

Ahmad Beirami@abeirami·2 Haz

After three incredible years, today is my last day at Google DeepMind! I am truly grateful to the amazing colleagues who made the journey 1000x more fruitful and enjoyable! I am forever indebted to my collaborators who showed me how to be better at everything via demonstrations.

English

757

85.4K

Ziteng Sun retweetledi

Ziteng Sun@SZiteng·10 Şub

English

258

67.5K

Ziteng Sun retweetledi

Nived Rajaraman@Nived_Rajaraman·9 May

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025

English

33.5K

Ziteng Sun retweetledi

Ahmad Beirami@abeirami·25 Nis

Today at 10am I will present @SZiteng's paper "block verification accelerates speculative decoding"

Ahmad Beirami@abeirami

Friday 10am, I will present @SZiteng's paper on 𝐛𝐥𝐨𝐜𝐤 𝐯𝐞𝐫𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐬𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐝𝐞𝐜𝐨𝐝𝐢𝐧𝐠 (w/ @th33rtha) x.com/SZiteng/status…

English

5.8K

Ziteng Sun@SZiteng·25 Mar

@hongyangzh @ispobaoke Pretty impressive! Thanks!

English

Hongyang Zhang@hongyangzh·24 Mar

@SZiteng Yes, we have studied this case. This is the experiment done by the SGLang team @ispobaoke in the SGLang environment. The draft length is 3 and there is no draft tree.

English

131

Hongyang Zhang@hongyangzh·21 Mar

Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration! - 5x🚀than vanilla (on HF) - 1.4x🚀than EAGLE-2 (on HF) - A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang) - 1.65x🚀in latency even for large bs=64 (on SGLang) - A new scaling law: more training data, better speedup - Apache 2.0 Paper: arxiv.org/abs/2503.01840 Code: github.com/SafeAILab/EAGLE SGLang version: github.com/sgl-project/sg… ⚒️Takeaway: Introducing training-time test, a novel draft model training technique: we replace feature prediction with direct token prediction and shift from top-layer-only features to multi-layer feature fusion. This approach unlocks a new scaling law previously undiscovered in EAGLE and EAGLE-2. 🙏Acknowledge: We would like to thank the SGLang team (@zhyncs42 @lm_zheng @ying11231 @JamesLiuID, @ispobaoke, and others @lmsysorg) for their merge and careful evaluation of EAGLE-3 on SGLang. 🤝Want to collaborate? We're a small academic group with limited GPU resources. If you're interested in supporting our next version of EAGLE or would like us to train a preliminary version tailored to a specific model, please get in touch! Joint work with Yuhui Li, Fangyun Wei, and Chao Zhang

English

298

41.2K

Keşfet

@th33rtha @ananthbshankar @abeirami @jacobeisenstein @BanghuaZ @GaoZhaolin @elonmusk @BarackObama