Suhong Moon

60 posts

Suhong Moon

Suhong Moon

@SuhongMoon

PhD student at Berkeley AI Research (@berkeley_ai)

Berkeley Katılım Aralık 2020
312 Takip Edilen128 Takipçiler
Sabitlenmiş Tweet
Suhong Moon
Suhong Moon@SuhongMoon·
Rich backstories enable deeper persona binding. LLMs conditioned with the backstories go beyond predicting human opinions to simulating how people perceive others and are perceived in return.
Minwoo (Josh) Kang@joshminwookang

🤔 Do LLMs exhibit in-group↔out-group perceptions like us? ❓ Can they serve as faithful virtual subjects of human political partisans? Excited to share our paper on taking LLM virtual personas to the *next level* of depth! 🔗 arxiv.org/abs/2504.11673 🧵

English
0
0
3
368
Suhong Moon retweetledi
Joseph Jeesung Suh
Joseph Jeesung Suh@JosephJSSuh·
LLMs have dominated recent work on simulating human behaviors. But do you really need them? In discrete‑choice settings, our answer is: not necessarily. A lightweight graph neural network (GNN) can match or beat strong LLM-based methods. Paper: arxiv.org/abs/2511.02135 🧵👇
Joseph Jeesung Suh tweet media
English
3
15
62
36K
Jaemin Cho
Jaemin Cho@jmin__cho·
Sharing some personal updates 🥳: - I've completed my PhD at @unccs! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug 2025 - Jul 2026), so feel free to reach out! 🔎 Endless thanks to my amazing advisor @mohitban47, the @uncnlp group, my partner @HeesooJang2, and my family. I couldn’t have done this without your constant support 🙏 Also, a heartfelt shoutout to all the collaborators I’ve worked with over the years—your ideas, encouragement, and hustle have meant the world. Excited for what’s ahead. Let’s keep building together! ❤️
Jaemin Cho tweet mediaJaemin Cho tweet media
English
65
51
448
91K
Suhong Moon retweetledi
Nick Lee
Nick Lee@nicholaszlee·
🚀 Excited to share that our paper on Plan-and-Act has been accepted to ICML 2025. Below is a TLDR: 🔎 Problem: • LLM agents struggle on complex, multi-step web tasks (or API calls for that matter). • Why not add planning for complex tasks and decouple planning and execution? • Planning only helps if it’s accurate, and LLMs aren’t trained for that. • Even small plan errors can drastically degrade performance. 💡 Thoughts: • Separate PLANNER and EXECUTOR models: Web agents especially benefit, as acting on HTML needs different skills than step by step planning. • Finetune each model with synthetic data to train the PLANNER and EXECUTOR models; no manual annotations or simulators needed. • Plan-and-Act provides a scalable framework to create such synthetic data in a scalable manner for web tasks 📦 How we generate synthetic data: • Use a Teacher LLM to generate new user queries from seed examples. • A second Teacher tries to solve each query, generating action trajectories. • We verify the success of each trajectory automatically. • Another LLM reverse-engineers a plan from the trajectory. • Finally, we expand this dataset further with more synthetic plans using LLMs. • We then use this data to fine-tune the base models ⚡ Results: 🏆 New SOTA for text-only open-source models with up to 40% improvement with our synthetic finetuning approach: • 57.58% on WebArena-Lite • 81.36% on WebVoyager • 48.15% on WebArena Paper: arxiv.org/abs/2503.09572 Joint work w/ @eren_lutfi78249 @sehoonkim418 @SuhongMoon @frt03_ @GopalaSpeech @KurtKeutzer @amir__gholami
Nick Lee tweet media
English
0
5
14
1.9K
Suhong Moon retweetledi
Ritwik Gupta 🇺🇦
Ritwik Gupta 🇺🇦@Ritwik_G·
Do LLMs understand probability distributions? Can they serve as effective simulators of probability? No! However, in our latest paper that via in-context learning, LLMs update their broken priors in a manner akin to Bayseian updating. 📝 arxiv.org/abs/2503.04722
English
5
31
159
32.3K
Suhong Moon retweetledi
Coleman Hooper
Coleman Hooper@coleman_hooper1·
How can we efficiently scale up test-time compute with parallel tree search? 🚨 Introducing Efficient Tree Search (ETS): A new method for achieving efficient and accurate test-time search for LLM reasoning tasks! - Test-time scaling has emerged as a new axis for improving model performance by leveraging additional computation at inference time in order to solve more challenging problems. - One promising approach for scaling compute at test time is through search, where a model generates multiple potential candidates and we then filter these candidates down to a single final response. This type of search can be performed as a tree (where each level of the tree corresponds to taking an additional step towards solving the problem). - Previous tree search methods have either yielded low accuracy (due to performing insufficient exploration) or else have substantial efficiency penalties due to the high cost of exploring diverse trajectories. These efficiency penalties are because of increased memory requirements due to reduced shared KV cache state during the search process. Our method, ETS, addresses this challenge by encouraging KV cache sharing in the search process to reduce memory consumption, while maintaining the exploration of semantically diverse trajectories which is critical for attaining high accuracy. Paper: arxiv.org/abs/2502.13575 Code: github.com/SqueezeAILab/E… Joint work with:  @sehoonkim418 @SuhongMoon Kerem Dilmen @sudomonish @nicholaszlee Michael Mahoney @Sophia_Shao_ @KurtKeutzer @amir__gholami 🧵 [1/7]
English
2
7
22
3.4K
Suhong Moon retweetledi
Simon Guo
Simon Guo@simonguozirui·
LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇
Simon Guo tweet media
English
9
68
303
113.9K
Suhong Moon
Suhong Moon@SuhongMoon·
Surveys are key for public opinion research but are costly. Can AI help? We fine-tune LLMs on SubPOP, a newly curated large-scale dataset, reducing the human-LLM opinion gap by up to 46%. Please check our paper and code!
Joseph Jeesung Suh@JosephJSSuh

Can LLMs assist public opinion survey designs by predicting responses? We fine-tune LLMs on our new large-scale survey response dataset, SubPOP, which reduces the distributional gap between human-LLM predictions by up to 46% 📊 A 🧵 on our findings: 👇

English
0
1
3
385
Suhong Moon retweetledi
Patrick Wu
Patrick Wu@tsunghan_wu·
🚨 Launching The Visual Haystacks (VHs) Benchmark: the first "visual-centric" Needle-In-A-Haystack (NIAH) benchmark to assess LMMs' capability in long-context visual retrieval and reasoning. Check out the 🧵 to see our findings on models like #GPT4o, #Gemini, and more! (1/7)
Patrick Wu tweet media
English
1
7
22
14.3K
Suhong Moon retweetledi
Alex Pan
Alex Pan@aypan_17·
LLMs have behaviors, beliefs, and reasoning hidden in their activations. What if we could decode them into natural language? We introduce LatentQA: a new way to interact with the inner workings of AI systems. 🧵
Alex Pan tweet media
English
7
28
172
34.2K
Woosuk Kwon
Woosuk Kwon@woosuk_k·
I’ll be at #NeurIPS 2024 this week and happy to chat about LLM inference. Feel free to reach out!
English
3
0
17
1.2K
Suhong Moon retweetledi
David Chan
David Chan@_dmchan·
🚨 Call for Papers! 🚨 Join us at the Workshop on Human Alignment in AI Decision-Making Systems (IEEE CAI 2025) to explore challenges & opportunities in aligning AI with human values & societal norms📅 Papers Due 1/15/25 Details ➡️ sites.google.com/view/ieee-cai-… #AIAlignment #AIethics
English
0
1
3
193
Suhong Moon retweetledi
Heesoo Jang
Heesoo Jang@HeesooJang2·
South Korean President Yoon Suk Yeol just declared emergency martial law to "eradicate the shameless pro-North Korean anti-state forces that are plundering the freedom and happiness of our people." This is a whole other level of authoritarian move we're seeing. #SouthKorea
Heesoo Jang tweet media
English
1
3
18
5.6K
Suhong Moon retweetledi
Coleman Hooper
Coleman Hooper@coleman_hooper1·
🚨 Introducing Squeezed Attention! A new fast attention method for long context inference providing up to 4x speed up 🚄 Many LLM applications require processing long input prompts for tasks such as document analysis and code generation; however, inference cost linearly increases with sequence length, making long context inference prohibitively expensive and slow. One key trait for many LLM applications is that a large portion of the input prompt is fixed across successive user queries, for example, when a user asks multiple questions against a document, knowledge source, or codebase. This means that we have the opportunity to perform offline optimizations to be able to accelerate attention for user inputs when they are received. Our hierarchical method can reduce the complexity of attention from linear to logarithmic with respect to the fixed context length. Squeezed Attention accelerates attention from the user input to the fixed context in the prompt by identifying which keys are important for a given query, and then only computing attention with these important keys. This identification is performed by comparing the query with clusters of keys, and then progressively refining this comparison using finer-grained clusters in order to identify the important keys for the query. Paper: arxiv.org/abs/2411.09688 Code: github.com/SqueezeAILab/S… Joint work with:  @sehoonkim418 @hiva_moh @sudomonish June Paik from @FuriosaAI  Michael Mahoney @KurtKeutzer @amir__gholami 🧵 [1/7]
Coleman Hooper tweet media
English
2
12
28
2.5K