Lihao Sun

33 posts

Lihao Sun

@1e0sun

Research resident at Microsoft. prev. @UChicago CS & CogSci.

Katılım Ocak 2023

209 Takip Edilen178 Takipçiler

Lihao Sun@1e0sun·4d

awesome point - we did verify - comparing autoregressive (KV cache), full-sequence forward (our two-pass), and truncated forward under "deterministic" settings: Tokens: perfect match across all three methods. Hidden states: not bit-identical (float16 computation order differences), but worst-case cosine sim > 0.99, KL < 0.001. Way below anything that would affect probe accuracy or PCA geometry. So yes, mathematically equivalent via causal masking, but tiny bits of difference, possibly due to floating point stuff - though not a huge difference for the observations, but careful when you use it for settings where high precision matters. we will probably add these to the appendix! thanks!

English

wassname@wassname·4d

@1e0sun huh, for some reason I thought they might be different, as one is teacher forced. you've checked it's equiv? you could pass kv cache too the 2nd step too I've been doing it in janky ways /356c24d5c886163bf13751a72fcb7980

English

Lihao Sun@1e0sun·5d

How do LLMs do CoT reasoning internally? In our new #ACL2026 paper, we show that reasoning unfolds as a structured trajectory in representation space. Correct and incorrect paths diverge, and we use this to predict correctness before the answer and correct errors mid-flight. 1/

English

290

19K

Lihao Sun@1e0sun·4d

@wassname haha yes - pass 1 generates the full output under deterministic settings, then pass 2 runs the prompt + output to extract all hidden states at once. with causal masking you recover the same activations without storing them token-by-token.

English

wassname@wassname·4d

@1e0sun "Two-pass single-sample greedy generation" whaaa this works?

English

Lihao Sun@1e0sun·4d

@LuxInvariantAI wow mind sharing link to the publication? couldn’t find it online

English

111

LuxInvariantAI@LuxInvariantAI·4d

@1e0sun Vs the @LuxInvariantAI Logical framework

English

121

Lihao Sun@1e0sun·5d

Great question! We chose Instruct + R1-Distill + Base to span a range of training regimes while keeping the architecture constant. We wanted to establish a general claim - it will also be interesting to explore the structures in an RLVR model! re the format concern: the organization exists across formats including “Step X:” and freeform responses (no formatting instructions; numbered lists, paragraphs, single blocks). See section 3.5 for more details!

English

249

Thanh@t_d_tr·5d

@1e0sun Nice paper! question: is there a reason you chose instruct and r1 instead of a direct rlvr model here? could the high auc be because of format triggering from distillation/sft?

English

268

Lihao Sun@1e0sun·5d

📢 Accepted to #ACL2026 Main Conference! Thanks to all collaborators at Microsoft: Hang Dong, Bo Qiao, Qingwei Lin, Dongmei Zhang, and Saravan Rajmohan. Paper: arxiv.org/pdf/2604.05655 Website: slhleosun.github.io/reasoning_traj/ Code: github.com/slhleosun/reas… 8/

English

740

Lihao Sun@1e0sun·5d

Reasoning length is also controllable. Steering hidden states toward the termination subspace shortens reasoning; steering away extends it. At moderate strengths this works as a smooth knob with minimal accuracy changes - push too hard and the model enters repetitive loops. 7/

English

815

Lihao Sun@1e0sun·10 Nis

Great to see so many convergent findings on affective structure in LLMs lately!

Melanie Weber@mweber_PU

How are emotions represented in the latent geometry of LLMs? We analyze affective representations in latent space and show that they mirror classic valence-arousal models from psychology (similar to concurrent work @AnthropicAI @1e0sun) and display nonlinear structure that supports uncertainty quantification and steering in emotion tasks with implications for model transparency and AI safety.

English

647

Lihao Sun@1e0sun·7 Nis

While we find consistent circular VA geometry across Llama and Qwen models, @AnthropicAI concurrently finds similar structure in Claude. Check our work out! arxiv.org/pdf/2604.03147 And thanks to all collaborators: Andrew Lee (@a_jy_l), Lewen Yan, Xiaoya Lu, Jie Zhang, and Jing Shao. 6/

English

652

Lihao Sun@1e0sun·7 Nis

💡New paper! Woke up to @AnthropicAI's emotion paper and realized - “wait, that's our finding too.” So we ArXiv'd immediately. We concurrently uncovered a circular geometry of emotions organized by valence and arousal (VA), as well as steering effects on downstream behaviors like refusal and sycophancy. We further provide a mechanistic account for why: refusal and compliance tokens occupy distinct regions in this space. 1/

English

102

15.6K

Lihao Sun@1e0sun·7 Nis

One possible reason why: consider refusal-related token embeddings (“no”) and compliance tokens (“sure”). Take their mean diff and project onto our VA circle, which lands at 256°: negative in both V and A. Steering in -V or -A promotes the likelihood of refusal tokens! Generally, emotion prompting and VA steering change the emission probabilities of these key tokens, thereby affecting downstream behaviors - further supported by logit shifts and neuron analysis. 5/

English

482

Lihao Sun@1e0sun·7 Nis

Somewhat surprisingly, the VA axes also provide monotonic, bidirectional control over multiple downstream behaviors, including refusal and sycophancy. Arousal is a strong lever - increasing arousal leads to lower refusal rates, while decreasing arousal leads to more refusal behavior. 4/

English

564

Lihao Sun@1e0sun·7 Nis

@AnthropicAI Unlike Anthropic, we steer along the circular manifold at 0°, 30°, 60°, 90°, etc. This controls the valence and/or arousal level of the model’s outputs, validating that the recovered axes correspond to valence and arousal in a human-interpretable sense. 3/

English

655

Lihao Sun@1e0sun·7 Nis

@AnthropicAI We use mean-diff to extract emotion steering vectors. PCA + ridge regression reveals a circumplex akin to the circumplex model of emotions in human psychology. Projections onto these axes correlate with human-crowdsourced VA ratings across 44k words (valence r=0.71). 2/

English

692

Keşfet

@wassname @LuxInvariantAI @AnthropicAI @a_jy_l @elonmusk @BarackObama @taylorswift13 @cristiano