Han Guo

3.4K posts

Han Guo banner
Han Guo

Han Guo

@HanGuo97

PhD Student @MIT_CSAIL | Past: @togethercompute @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.

Katılım Ağustos 2016
4.5K Takip Edilen4.2K Takipçiler
Han Guo retweetledi
Rulin Shao
Rulin Shao@RulinShao·
DR Tulu is now accepted for an oral presentation at #ICML2026 🙏 Updated paper: arxiv.org/abs/2511.19399 📥We added more ablations including using Qwen3-8B as the rubric generator&judge, showing evolving rubrics work with a weak model too; spurious rewards sanity check, etc. Live demo: dr-tulu.org Code&models: github.com/rlresearch/dr-…
Rulin Shao@RulinShao

Happy to share that DR Tulu has been accepted to ICML as a ✨Spotlight✨! We believe that co-evolving the agent and its reward metric can lead to more capable intelligence. DR Tulu is a team effort. Huge thanks and congrats to all my amazing collaborators and mentors!

English
2
23
151
11.3K
Han Guo retweetledi
Yoav Gelberg
Yoav Gelberg@yoav_gelberg·
Excited about this new work As KV compaction becomes increasingly important, we ask whether it’s worth adapting the model itself to perform better under compaction Turns out, it can really matter
Yam Eitan@ytn_ym

1/ How much can you compress an LLM’s KV cache? tl;dr it depends on how you train your model. Many strong context compaction methods, such as Cartridges and attention matching, operate post-hoc: given a fixed model and a context, they try to compress the resulting KV cache. @yoav_gelberg and I ask the complementary question: can we train the model to produce KV representations that are easier to compress? In other words: keep the compression method fixed, and change the representations it sees.

English
3
24
134
18.2K
Rohit Agarwal
Rohit Agarwal@Rohit_Writes·
@HanGuo97 Claude Code goal mode is a good start, or even just trying a simple genetic algorithm with AI refiner could work. Unfortunately none of the platforms are too mature yet..
English
1
0
1
55
Han Guo
Han Guo@HanGuo97·
LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip. Bonus: LLMs can write fast CODA kernels too (approaching SoLs).
Han Guo tweet media
English
15
100
675
189.4K
Rohit Agarwal
Rohit Agarwal@Rohit_Writes·
@HanGuo97 Now that LLMs can author kernels using Claude Code, you ever try throwing autoresearch on top of this?
English
1
0
0
386
Han Guo retweetledi
Mingkai Deng
Mingkai Deng@mdeng34·
This is a prototype using language-based world models. Stay tuned for our next steps on multimodal and physical world models. The concept of a configurator, which decides when and how deeply to engage a reasoning process, is not specific to planning, but extensible to learning and adaptation going forward. 📄 SR²AM: arxiv.org/abs/2605.22138 📄 SiRA: arxiv.org/abs/2507.23773 🌐 Project: sailing-lab.github.io/sr2am-self-reg… 💻 Code: github.com/sailing-lab/sr… 🤗 SR²AM-v0.1-8B: huggingface.co/sailing-lab/SR… 🤗 SR²AM-v1.0-30B: huggingface.co/sailing-lab/SR… Joint work with @jinyuhou0, @larasnevess, @varad0309, @tw_killian, @waterluffy, @ericxing
English
2
10
59
4.2K
Han Guo retweetledi
Mingkai Deng
Mingkai Deng@mdeng34·
Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.
Mingkai Deng tweet media
English
3
45
274
59.9K
Han Guo retweetledi
Ryan Bahlous-Boldi
Ryan Bahlous-Boldi@RyanBoldi·
Your RL post-training may be sabotaging your LLM’s test-time scaling! Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*. We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.
Ryan Bahlous-Boldi tweet media
English
34
119
844
201.1K
Han Guo
Han Guo@HanGuo97·
@silverhawk_ny Another good question. The key idea is to split the reduction into tile-level reductions followed by a separate reduction pass. FWIW, you could do the whole dimension reduction but that might involve atomics.
English
1
0
0
9
arch rock
arch rock@silverhawk_ny·
@HanGuo97 Another question for GEMM epilog with reduction kernel, usually we need to hold the GEMM results ion resigner for whole feature dim reduction, which could cause register pressure, do you have any novel algorithms to solve this?
English
1
0
0
8
Han Guo retweetledi
Jyo Pari
Jyo Pari@jyo_pari·
The computational abstractions humans developed are great for building architectures, however they’re not necessarily the right abstractions for kernels. Han shows why 🔥
Han Guo@HanGuo97

LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip. Bonus: LLMs can write fast CODA kernels too (approaching SoLs).

English
0
3
21
3.1K
Han Guo
Han Guo@HanGuo97·
@silverhawk_ny Good question! The epilogue could use TMA to load/store auxiliary data.
English
0
0
0
12
arch rock
arch rock@silverhawk_ny·
@HanGuo97 Is there a good way we can use TMA to pipeline warp specilization between GEMM and epilog ops?
English
1
0
0
21
Han Guo
Han Guo@HanGuo97·
@DaviJin Let’s make Blackwell GPUs go brrrrr
English
1
0
4
214
Han Guo retweetledi
Diyi Yang
Diyi Yang@Diyi_Yang·
The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :) With all students in my cs329x Human-Centered LLM class, we present 60+ pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵
Diyi Yang tweet media
English
14
71
281
44.6K