Brace(Hanyang) Zhao

81 posts

Brace(Hanyang) Zhao

Brace(Hanyang) Zhao

@OptionsGod_lgd

PhD student at @Columbia working on post training generative models | ex intern at @Netflix @CapitalOne

Manhattan, NY Katılım Nisan 2021
881 Takip Edilen123 Takipçiler
Brace(Hanyang) Zhao
Brace(Hanyang) Zhao@OptionsGod_lgd·
@yule_gan Interesting work! Curious that will the conclusion hold for models beyond Qwen 2.5, e.g. Qwen3? I remember there's supurious rewards paper last year whose observations also only hold for Qwen 2.5. Could this be the Qwen 2.5's magic?
English
2
0
0
252
Yulu Gan
Yulu Gan@yule_gan·
Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu
Yulu Gan tweet media
English
87
430
3K
670.6K
Brace(Hanyang) Zhao
Brace(Hanyang) Zhao@OptionsGod_lgd·
@xidulu Believe me that getting team matched twice and then both rejected could feel worse 😭
English
0
0
1
90
Xidulu
Xidulu@xidulu·
Every intern application season, I had more papers, more citations, more internship experiences, and offers from more prestigious places than last year; However one thing that doesn't change is that Google student researcher program NEVER gives me any sort of interview LOL
English
11
0
223
18K
Hamish Ivison
Hamish Ivison@hamishivi·
4.6 writes a plan and then says 'Excellent plan' to itself lmao
Hamish Ivison tweet media
English
64
27
2.3K
90.3K
Brace(Hanyang) Zhao
Brace(Hanyang) Zhao@OptionsGod_lgd·
@QPHutu I guess it is much easier to choose a proper clip ratio to start with like 0.2 in common practice than this absolute difference delta? Any idea of the range of delta and is delta sensitive?
English
1
0
0
582
Penghui Qi
Penghui Qi@QPHutu·
This time we should say goodbye to PPO/GRPO for real 👋 PPO is a great algorithm in classical RL settings. However, it is fundamentally flawed in LLM regime due to the large, long-tailed vocabulary.💔 Checkout our paper for more details👇
Penghui Qi tweet media
English
13
74
542
45.3K
Brace(Hanyang) Zhao
Brace(Hanyang) Zhao@OptionsGod_lgd·
Great work! We did something quite similarly for image diffusion almost one year ago 🤣 In Rich Preference Optimization (arxiv.org/pdf/2503.11720), we train diffusion models by generating better images conditional on feedbacks (like your second turn answer), yet we use dpo instead of online rl.
English
1
1
4
756
Yuda Song
Yuda Song@yus167·
RL on LLMs inefficiently uses one scalar per rollout. But users regularly give much richer feedback: "make it formal," "step 3 is wrong." Can we train LLMs on this human-AI interaction? We introduce RL from Text Feedback, with 1) Self-Distillation; 2) Feedback Modeling (1/n) 🧵
Yuda Song tweet media
English
14
101
602
104.5K
Brace(Hanyang) Zhao
Brace(Hanyang) Zhao@OptionsGod_lgd·
Cool work! We had an earlier quite similar exploration in utilizing VLMs to improve Diffusion Models as well in RPO (arxiv.org/pdf/2503.11720), but instead we create synthetic preference pairs by revising upon original image from critiques and revising suggestions provided by VLMs!
English
0
0
1
66
Kyle Sargent
Kyle Sargent@KyleSargentAI·
Vision-language models are getting better every day. Can we use them to improve image compression? Yes! For my internship, working w/ @GoogleDeepMind, @GoogleResearch, we designed VLIC, a diffusion autoencoder post-trained with VLM preferences. Our preprint is out today! A🧵:
Kyle Sargent tweet media
English
5
39
314
43.7K
Brace(Hanyang) Zhao
Brace(Hanyang) Zhao@OptionsGod_lgd·
Really miss the good old times of RL and Diffusion Models: Both domains are with interesting math involved, and cool experiments that can be done like RL gym envs or image generation tasks to verify the ideas. Recent research and papers are fundemenatlly different flavors....
English
0
0
0
92
Brace(Hanyang) Zhao
Brace(Hanyang) Zhao@OptionsGod_lgd·
Happy moments of the past 2025: received a grant from @thinkymachines 🥹 Hope to build something cool in the new year!
Brace(Hanyang) Zhao tweet media
English
0
0
1
124
Brace(Hanyang) Zhao retweetledi
Yang Song
Yang Song@DrYangSong·
Applications change, but the principles are enduring. After a year's hard work led by @JCJesseLai, we are really excited to share this deep, systematic dive into the mathematical principles of diffusion models. This is a monograph we always wished we had.
Chieh-Hsin (Jesse) Lai@JCJesseLai

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!

English
7
42
447
57.8K
Brace(Hanyang) Zhao retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker
Thinking Machines tweet media
English
245
789
5.9K
4.2M
Brace(Hanyang) Zhao retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/
Thinking Machines tweet media
English
81
557
3.5K
1.4M
Aviral Kumar
Aviral Kumar@aviral_kumar2·
🚨🚨New paper on core RL: a way to train value-functions via flow-matching for scaling compute! No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction & SOTA results on whatever we tried. 🧵⬇️
Aviral Kumar tweet media
English
11
81
703
70.1K
Brace(Hanyang) Zhao retweetledi
机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN@jiqizhixin·
ByteDance is exploring diffusion LLMs too! 👀 Seed Diffusion Preview: a blazing-fast LLM for code, built on discrete-state diffusion. With 2,146 tokens/sec inference on H20 GPUs, it outpaces Mercury & Gemini Diffusion, while matching their performance on standard code benchmarks. New SOTA on the speed–quality Pareto frontier. 🚀
机器之心 JIQIZHIXIN tweet media
English
5
72
533
46.4K
Brace(Hanyang) Zhao retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
Google DeepMind tweet media
English
152
705
4.3K
1.1M
Brace(Hanyang) Zhao
Brace(Hanyang) Zhao@OptionsGod_lgd·
Will be presenting this paper on Wednesday (July 16) 4:30 p.m.-7:00 PDT p.m. at West Exhibition Hall B2-B3 W-706! Love to chat about Diffusion Models, RL, LLMs or anything else!
Brace(Hanyang) Zhao@OptionsGod_lgd

Maybe a bit late, but I am thrilled to announce that our Scores as Actions (arxiv.org/abs/2502.01819) paper is accepted by #icml2025 🎉 We propose a continuous-time RL method for diffusion models RLHF (which are naturally continuous time) and perform better than discrete-time RL!

English
0
0
0
203