Ruohao Guo

47 posts

Ruohao Guo

Ruohao Guo

@GuoOctavia

CS PhD Student @ICatGT | Undergrad @UofIllinois

Atlanta, GA Katılım Ekim 2018
365 Takip Edilen134 Takipçiler
Sabitlenmiş Tweet
Ruohao Guo
Ruohao Guo@GuoOctavia·
Ever wondered if style lexicons still play a role in the era of LLMs? 🤔 We tested 13 established and 63 novel language styles across different LLMs. 🧠✨ It turns out lexicons are still crucial for style understanding! But how can we better leverage this lexical knowledge? Our approach: meta-tuning LLMs to leverage lexical knowledge for generalizable language style understanding. Check out our latest work at Main of #ACL2024NLP! 🚀 arxiv.org/abs/2305.14592 @mlatgt @ICatGT
Ruohao Guo tweet media
English
1
9
29
6.3K
Wei-Lin Chen
Wei-Lin Chen@WeiLin__Chen·
🚀 New paper from my internship at @Google! LLMs can “think” for a long time only to get the answer wrong — more tokens do not always help and may be overthinking 😵‍💫 We introduce Deep-Thinking Ratio (DTR), a new way to measure LLM reasoning effort. The idea: Count the tokens models had to think deeply to produce. 🧵
Wei-Lin Chen tweet media
English
19
71
624
45.5K
Ruohao Guo retweetledi
Cong Wei
Cong Wei@CongWei1230·
Thrilled to open-source UniVideo🎬! UniVideo brings unified multimodal understanding, generation, and editing to the video domain One framework for • video/image understanding • text/image → image/video generation • free-form image/video editing • reference-driven image/video generation/editing Code: github.com/KlingTeam/UniV… Model: huggingface.co/KlingTeam/UniV… Project Page: congwei1230.github.io/UniVideo Huggingface: huggingface.co/papers/2510.08…
English
24
61
443
67.8K
Ruohao Guo retweetledi
Yang Chen
Yang Chen@ychenNLP·
🥈 Silver Medal at IOI 2025 & Outperforms DeepSeek-R1-0528 on LiveCodeBench. Instead of mixing different tasks together, we scale *Cascade RL* to develop general LLMs in curriculum (RLFH -> Instruct -> Math -> Code -> SWE). So many learnings, check out our report!👇
Yang Chen tweet media
Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade! 🚀 We’re thrilled to release Nemotron-Cascade, a family of general-purpose reasoning models trained with cascaded, domain-wise reinforcement learning (Cascade RL), delivering best-in-class performance across a wide range of benchmarks. 💻 Coding powerhouse After RL, our 14B model: • Surpasses DeepSeek-R1-0528 (671B) on LiveCodeBench v5/v6/Pro. • Achieves silver-medal performance at IOI 2025 🥈. • Reaches a 43.1% pass@1 on SWE-Bench Verified, and 53.8% with test-time scaling. 🧠 What is Cascade RL? Instead of mixing heterogeneous prompts across domains, Cascade RL trains sequentially, domain by domain, which reduces engineering complexity, mitigates heterogeneous verification latencies, and enables domain-specific curricula and tailored hyperparameter tuning. ✨ Key insight Using RLHF for alignment as a pre-step dramatically boosts complex reasoning—far beyond preference optimization. Subsequent domain-wise RLVR stages rarely hurt the benchmark performance attained in earlier domains and may even improve it, as illustrated in the following figure. 🤗 Models & training data 🔥 👉 huggingface.co/collections/nv… 📄 Technical report with detailed training and data recipes 👉 arxiv.org/pdf/2512.13607

English
5
43
226
23.9K
Ruohao Guo retweetledi
Alan Ritter
Alan Ritter@alan_ritter·
At #NeurIPS2025 through Sunday. Come say hi and check out our posters on: 🔒Probabilistic reasoning for text anonymity estimation: Wednesday @ 11am 🤖 Efficient, self-improving agents: Friday @ 11am
Alan Ritter tweet mediaAlan Ritter tweet media
San Diego, CA 🇺🇸 English
0
5
11
1.1K
Ahmad Beirami
Ahmad Beirami@abeirami·
Will be at NeurIPS Thu Dec 4 to Sun Dec 7, excited to reconnect with old friends and make new ones. If you are excited about AI engineering (orchestration, evals, and optimizing scaffolds), we are hiring! On Saturday I’ll be on panels at the Reliable ML & UniReps workshops.
English
7
9
205
29.7K
Ruohao Guo retweetledi
Kai Zhang
Kai Zhang@KaiZhang_CS·
Introducing early experience: using future states resulting from agent’s own action as scalable supervision to train itself - without reward🧠! 1️⃣Reward-free: can train directly in real-world environments. 2️⃣Better RL warm-start: when continued with RL, leads to higher final performance than imitation-only warm-ups. 3️⃣Data-efficient & scalable: outperforms imitation with even 1/8 data.👇
Jason Weston@jaseweston

🌀Agent Learning via Early Experience🌀 📝: arxiv.org/abs/2510.08558 - SFT for agents is sparse; RL on long-horizons is hard We provide new mid-training signals that work: 1) Implicit next state world modeling task 2) Self-reflection on alternate states - Strong improvements over 8 environments and multiple model families - Works well for subsequent RL! 🧵1/5

English
2
27
112
15.8K
Ruohao Guo retweetledi
Yao Dou
Yao Dou@Yaooo01·
Can LLM-simulated users replace expensive human evaluation for multi-turn conversations? Short answer: yes, if you model the user right. With our SimulatorArena, we find that detailed user profiles (knowledge + message style) improve alignment with real human evaluation by 26% at <3% the cost. #EMNLP2025 [1/6] 🧵
Yao Dou tweet media
English
4
25
134
9.9K
Ruohao Guo retweetledi
Jungsoo Park
Jungsoo Park@jungsoo___park·
What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸
Jungsoo Park tweet media
English
3
8
27
9.8K
Xingyu Fu
Xingyu Fu@XingyuFu2·
😌Been wanting to post since March but waited for the graduation photo….Thrilled to finally share that I’ll be joining Princeton University as a postdoc @PrincetonPLI this August! Endless thanks to my incredible advisors and mentors from Penn, UW, Cornell, NYU, UCSB, USC, Columbia, MSFT, and beyond—you’ve shaped my thinking and helped me grow so much. If you’re into multimodal research, especially VLMs / fusion models / image video gen, feel free to DM me, would love to connect!
Xingyu Fu tweet media
English
25
5
456
39.1K
Jaemin Cho
Jaemin Cho@jmin__cho·
Sharing some personal updates 🥳: - I've completed my PhD at @unccs! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug 2025 - Jul 2026), so feel free to reach out! 🔎 Endless thanks to my amazing advisor @mohitban47, the @uncnlp group, my partner @HeesooJang2, and my family. I couldn’t have done this without your constant support 🙏 Also, a heartfelt shoutout to all the collaborators I’ve worked with over the years—your ideas, encouragement, and hustle have meant the world. Excited for what’s ahead. Let’s keep building together! ❤️
Jaemin Cho tweet mediaJaemin Cho tweet media
English
65
49
449
90.2K
Ruohao Guo retweetledi
Wei Xu
Wei Xu@cocoweixu·
I am giving a keynote at PrivateNLP Workshop (sites.google.com/view/privatenl…) at #NAACL2025 (Sunday 9am CT). * GPT4-v is a performant geolocator, predicting exact GPS coordinates of image > any SOTA * LLMs can estimate privacy risk based on probabilistic reasoning > chain-of-thoughts
Wei Xu tweet mediaWei Xu tweet mediaWei Xu tweet mediaWei Xu tweet media
English
1
8
78
6.3K
Ruohao Guo retweetledi
alphaXiv
alphaXiv@askalphaxiv·
Introducing Deep Research for arXiv Ask questions like 'What are the latest breakthroughs in RL fine-tuning?' and get comprehensive lit reviews with trending papers automatically included Turn hours of literature searches into seconds with AI-powered research context ⚡
English
46
547
3K
372.3K
Ruohao Guo retweetledi
Hamish Ivison
Hamish Ivison@hamishivi·
How well do data-selection methods work for instruction-tuning at scale? Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best! More below ⬇️ (1/8)
Hamish Ivison tweet media
English
4
61
325
86.2K
Ruohao Guo retweetledi
Ethan Mendes
Ethan Mendes@EthanMendes3·
🚨New Paper: Better search for reasoning (e.g., web tasks) usually requires costly💰demos/rewards What if we only self-improve LLMs on state transitions—capturing a classic RL method in natural language? Spoiler: It works (⬆️39% over base model) & enables efficient search!🚀 🧵
Ethan Mendes tweet media
English
1
9
26
4.7K
Ruohao Guo retweetledi
Tarek Naous
Tarek Naous@tareknaous·
What causes entity-related cultural biases in LMs? Is it just pre-training data? Our latest paper shows how varying linguistic phenomena exhibited by entities (such as word sense in Arabic) impact the cross-cultural performance of LMs. arxiv.org/abs/2501.04662
Tarek Naous tweet media
English
2
12
48
4.1K
Yang Chen
Yang Chen@ychenNLP·
I've successfully defended my PhD! 🎓Really appreciate my advisor @alan_ritter @cocoweixu for everything throughout this journey🥺. Huge thanks to my amazing committee @mchang21 @kartik_goyal_ @Hexiang_Hu 🚙I'll move to CA and join @NVIDIA as a research scientist next month.
Alan Ritter@alan_ritter

Congratulations to @ychenNLP for successfully defending his PhD! Yang has done exciting work advancing both the multilingual and multimodal capabilities of LLMs. Many thanks to his committee: @cocoweixu (co-advisor), @mchang21, @Hexiang_Hu, @kartik_goyal_

English
19
8
141
19K
Ruohao Guo
Ruohao Guo@GuoOctavia·
Our experiments show that class randomization significantly boosts lexicon-based meta-tuning in LLMs, enabling effective leverage of lexical knowledge for zero-shot inference.
Ruohao Guo tweet media
English
1
0
3
247
Ruohao Guo
Ruohao Guo@GuoOctavia·
Ever wondered if style lexicons still play a role in the era of LLMs? 🤔 We tested 13 established and 63 novel language styles across different LLMs. 🧠✨ It turns out lexicons are still crucial for style understanding! But how can we better leverage this lexical knowledge? Our approach: meta-tuning LLMs to leverage lexical knowledge for generalizable language style understanding. Check out our latest work at Main of #ACL2024NLP! 🚀 arxiv.org/abs/2305.14592 @mlatgt @ICatGT
Ruohao Guo tweet media
English
1
9
29
6.3K