Haohan Wang

325 posts

Haohan Wang banner
Haohan Wang

Haohan Wang

@HaohanWang

Assistant professor @iSchoolUI at UIUC, affiliated at CS and IGB. Previously @CarnegieMellon (across CBD, LTI, MLD). Trustworthy AI. & Computational Biology

Champaign, Illinois Katılım Kasım 2012
757 Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
Haohan Wang
Haohan Wang@HaohanWang·
Ever since I was a teenager, I have been wondering why Google can make such a huge amount of money, and one reason I now believe is that it serves as a gatekeeper between user and the massive information online. Nowadays, we are witnessing a quick shift of this gatekeeper from Google-style search engine to large language models. Therefore, what used to matter a lot in the search engine context will soon start to matter in LLM context. One example would be the items ranked by search engine (so-called search engine optimization) and now by LLM. Therefore, we introduce this (one of the first) solutions to answer this question: "How can I write my product descriptions, so that it will be ranked at the top when a user asks an LLM to recommend similar things to buy" Here comes our recent work: 🚀 Controlling Output Rankings in Generative Engines for LLM-based Search 🚀 With a solution, a benchmark, and a demo. Check out our project page: llm-recommendation.vercel.app Or directly play with the demo to feel the power: ivonne-code.github.io/AI-recommendat…
English
1
3
18
1.3K
Haohan Wang
Haohan Wang@HaohanWang·
Ever since I was a teenager, I have been wondering why Google can make such a huge amount of money, and one reason I now believe is that it serves as a gatekeeper between user and the massive information online. Nowadays, we are witnessing a quick shift of this gatekeeper from Google-style search engine to large language models. Therefore, what used to matter a lot in the search engine context will soon start to matter in LLM context. One example would be the items ranked by search engine (so-called search engine optimization) and now by LLM. Therefore, we introduce this (one of the first) solutions to answer this question: "How can I write my product descriptions, so that it will be ranked at the top when a user asks an LLM to recommend similar things to buy" Here comes our recent work: 🚀 Controlling Output Rankings in Generative Engines for LLM-based Search 🚀 With a solution, a benchmark, and a demo. Check out our project page: llm-recommendation.vercel.app Or directly play with the demo to feel the power: ivonne-code.github.io/AI-recommendat…
English
1
3
18
1.3K
Haohan Wang
Haohan Wang@HaohanWang·
Excited to know that our EACL paper "Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models" has been covered by @QZeitgeist! We introduce a text-to-audio jailbreak that embeds harmful directives in narrative speech, exploiting acoustics to bypass text-calibrated safety in models like GPT-4o and Gemini 2.0 Flash—achieving up to 98.26% success rate over baselines. Thanks to the authors @Yeyu4Yu, @kevvvv123123, @junzhuang_
Haohan Wang tweet mediaHaohan Wang tweet media
English
1
4
11
925
Haohan Wang
Haohan Wang@HaohanWang·
Our ICML 2025 work is also part of the family x.com/HaohanWang/sta…
Haohan Wang@HaohanWang

Sharing our #ICML’25 paper that introduces REVOLVE — a new approach to prompt optimization that models how LLM responses evolve over time. It achieves +7.8% in prompt tuning, +20.7% in solution refinement, and +29.2% in code generation. 🚀

English
0
0
3
475
Haohan Wang
Haohan Wang@HaohanWang·
Celebrating the #ICLR2026 acceptance of our paper SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback 🚀 But what really matters is not the acceptance—it's the question that kicked everything off. A few months back, I kept feeling like prompt optimization was strangely familiar. Then it clicked: we're replaying 40 years of neural network parameter optimization... compressed into just ~3 years.🔂 ➡️Parameter side (1980s–2000s): Genetic algorithms → plain SGD (the big breakthrough moment) → Adam, momentum, adaptive rates, second-order tricks. ➡️Prompt side (2022–2025): Evolutionary search (GPS, EvoPrompt) → textual gradients (ProTeGi, TextGrad—the "SGD moment") → what comes next? We think SIPDO is a solid step toward the answer. Instead of passively optimizing against a fixed dataset, SIPDO closes the loop: 🌟A synthetic data generator actively crafts challenging examples to expose the current prompt's exact weaknesses 🌟The optimizer refines the prompt based on those failures 🌟Difficulty ramps up progressively (curriculum-style) 🌟The improved prompt feeds back to generate even harder data It's inspired by adversarial training + curriculum learning, leading to faster convergence and dramatically more robust prompts—no extra human annotations needed. We laid out this full "parallel evolution" framing in our recent blog post, tracing the arc from early genetic methods through textual gradients to where we believe Phase 3 (closed-loop, adaptive, history-aware systems like SIPDO) is headed next.If you're working on prompts, synthetic data, or LLM robustness, this historical lens might spark some ideas: the next real leap could be asking, “What would Adam (or even second-order methods) look like for prompts?”
Haohan Wang tweet media
English
2
2
22
777
Haohan Wang retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis 1. GenoMAS introduces a novel multi-agent framework that leverages large language models (LLMs) to automate gene expression analysis, addressing the complexity of genomic data and the need for domain expertise. This innovative approach combines the reliability of structured workflows with the adaptability of autonomous agents, achieving state-of-the-art performance in identifying gene–phenotype associations. 2. The core of GenoMAS is a guided-planning framework that transforms high-level task guidelines into executable code units, allowing agents to dynamically adjust their behavior based on evolving context. This balance between structure and flexibility enables the system to handle the intricate interdependencies in genomic data analysis while maintaining logical coherence. 3. GenoMAS employs a team of six specialized LLM agents, each contributing complementary strengths to a shared analytic canvas. The system integrates a diverse set of state-of-the-art LLMs, leveraging their unique capabilities in coding, reasoning, and domain expertise. This heterogeneous architecture significantly outperforms homogeneous LLM configurations. 4. The system achieves a Composite Similarity Correlation of 89.13% for data preprocessing and an F1 score of 60.48% for gene identification, surpassing prior art by 10.61% and 16.85% respectively. These results highlight the effectiveness of GenoMAS in producing biologically plausible gene–phenotype associations while adjusting for latent confounders. 5. GenoMAS incorporates a dynamic memory mechanism that stores validated code snippets for reuse, significantly improving efficiency. The system’s ability to autonomously adapt and correct errors during execution further enhances its robustness and reliability in handling complex genomic datasets. 6. The framework is evaluated on the GenoTEX benchmark, a comprehensive testbed reflecting the demands of end-to-end scientific coding. GenoMAS demonstrates superior performance across all tasks, including dataset selection, data preprocessing, and statistical analysis, showcasing its potential to democratize bioinformatics analyses. 📜Paper: arxiv.org/abs/2507.21035 #Genomics #AI #MultiAgentSystems #GeneExpressionAnalysis #ScientificAutomation
Biology+AI Daily tweet media
English
0
8
28
10.2K
Haohan Wang
Haohan Wang@HaohanWang·
#NeurIPS2025 LLMs can reason, but the reasoning does not always help. Check out our work for some counter-intuitive result with formalized understanding of the reasoning process of LLMs. 📍 Poster Session Wed, Dec 3, 2025 • 11:00 AM – 2:00 PM PST Exhibit Hall C, D, E — Booth #1414 Looking forward to seeing you! 🚀
Haohan Wang tweet media
Haohan Wang@HaohanWang

You’re watching a few rounds of poker games. ♠️♠️♠️♠️ The cards look normal — but the outcomes don’t.♦️ No one explains the rules. You just see hands play out. -- Can you figure out what’s going on? 🎯 That’s the setup, for LLMs. Recently, there is heated discussions on LLM's overall performance and reasoning ability, centering around a hypothesis: More reasoning steps → better performance. We tested that assumption. And the result is aligned with the hypothesis yet. 🙅‍♀️ We built four structured games — ♟chess, 🃏poker, 🎲dice, 🂡blackjack — Each with hidden rules. The models see only transcripts. No labels. No rulebook. Just sparse examples. ⚠️ CoT-enabled models consistently underperform non-reasoning LLMs. We traced this failure to a three-stage cascade: decomposition errors from misframed sub-tasks, solving errors driven by noisy or misaligned logic, and summarization errors from poor stopping decisions. The deeper the reasoning chain, the more these errors accumulate. Our analysis shows a U-shaped tradeoff: more steps help — until they don’t. 🛠️ To address this, we designed targeted interventions. Structured CoT, anchored examples, and token constraints consistently improve inductive accuracy — no retraining required. ✅ Reasoning helps only when it’s structured. Blind reasoning hurts. 📄 arxiv.org/abs/2505.24225

English
0
0
6
668
Haohan Wang
Haohan Wang@HaohanWang·
I will be traveling ✈️ to the #NeurIPS at the beautiful San Diego🏖️ for the whole week next week. We are working on several topics related to agentic AI and for scientific discovery. Looking forward to the reunion of old friends and meeting the new ones.
English
0
0
7
536
Haohan Wang
Haohan Wang@HaohanWang·
@LyceumCloud We saw several patterns that can predict defense mechanisms, it's more than a year since those summaries though.
English
0
0
1
16
Lyceum
Lyceum@LyceumCloud·
@HaohanWang Nice, bookmarking this. what patterns you’re seeing repeat across jailbreaks and how you expect defenses to evolve over the next year.
English
1
0
0
25
Haohan Wang
Haohan Wang@HaohanWang·
also let me tag some collaborators @advtydv, @junzhuang_, (and also Haibo and Man Luo), since it will be interesting to put these coverage into the record
English
1
1
2
172
Haohan Wang
Haohan Wang@HaohanWang·
Interesting, today, I just learnt one of our AI security work has been reported by several media 🗞️🗞️🗞️🗞️ arxiv.org/abs/2506.12274 It's a new jailbreak algorithm that forces the model to spit out non-compliance responses. Also, the paper that has never got luck enough to pass the peer review process, so evidence once again that peer review might be broken🥹
English
1
0
11
518
Haohan Wang
Haohan Wang@HaohanWang·
Thanks for the heads up, Maheep! It's a great pleasure to learn about your work @tarngerine , our team were among the first ones to attempt marrying LLM with SVG representations, and it's great pleasure to see the potential gets expanded. Probably we can chat more on it. arxiv.org/abs/2311.15543 arxiv.org/abs/2504.09764
English
0
0
2
22
julius tarng cyber inspector
julius tarng cyber inspector@tarngerine·
What happens when you turn a designer into an interpretability researcher? They spend hours staring at feature activations in SVG code to see if LLMs actually understand SVGs. It turns out – yes~ We found that semantic concepts transfer across text, ASCII, and SVG:
julius tarng cyber inspector tweet media
English
13
95
758
132.3K