Qingni Wang

26 posts

Qingni Wang banner
Qingni Wang

Qingni Wang

@Ceeqnn

incoming phd @UCSB

Katılım Temmuz 2022
181 Takip Edilen181 Takipçiler
Qingni Wang
Qingni Wang@Ceeqnn·
Excited to share that I’ll be presenting SAFER at #ICLR2026 🇧🇷 🎊 ✨ SAFER is a two-stage risk control framework for open-ended QA with LLMs.
Our goal is to move beyond heuristic uncertainty estimates toward statistically rigorous trustworthiness guarantees. Specifically, we introduce:
• Abstention-aware Sampling, which calibrates the minimum sampling budget needed to satisfy a user-specified risk level.
• Conformalized Filtering, which removes unreliable candidates while preserving coverage guarantees. Together, SAFER provides controllable miscoverage risk in open-ended generation and takes a step toward more trustworthy LLM deployment. 📌 Poster Session: April 24, 2026 | 3:15–5:45 PM BRT
📍 Pavilion 3, Poster P3-#1806 Authors: @Ceeqnn @YFan_UCSC @xwang_lk I’ll be presenting the poster — feel free to stop by and chat if you’re around. Looking forward to discussions at ICLR!
Qingni Wang tweet media
English
1
1
17
18.1K
Qingni Wang
Qingni Wang@Ceeqnn·
Having a great time in Rio at #ICLR2026 — met so many brilliant and fun people here. Excited to share more about our work SAFER and SafeGround, on trustworthy LLMs and GUI grounding. Feel free to follow our work! 🌴🤖
Qingni Wang tweet mediaQingni Wang tweet mediaQingni Wang tweet media
Qingni Wang@Ceeqnn

Excited to share that I’ll be presenting SAFER at #ICLR2026 🇧🇷 🎊 ✨ SAFER is a two-stage risk control framework for open-ended QA with LLMs.
Our goal is to move beyond heuristic uncertainty estimates toward statistically rigorous trustworthiness guarantees. Specifically, we introduce:
• Abstention-aware Sampling, which calibrates the minimum sampling budget needed to satisfy a user-specified risk level.
• Conformalized Filtering, which removes unreliable candidates while preserving coverage guarantees. Together, SAFER provides controllable miscoverage risk in open-ended generation and takes a step toward more trustworthy LLM deployment. 📌 Poster Session: April 24, 2026 | 3:15–5:45 PM BRT
📍 Pavilion 3, Poster P3-#1806 Authors: @Ceeqnn @YFan_UCSC @xwang_lk I’ll be presenting the poster — feel free to stop by and chat if you’re around. Looking forward to discussions at ICLR!

English
0
3
101
16.2K
Qingni Wang retweetledi
Chengzhi Liu
Chengzhi Liu@liuchen02938149·
Excited to share that EvoPresent will be showcased at #ICLR2026 🇧🇷! ✨ EvoPresent is a self-improvement agent framework for academic paper presentations. Our goal is to make AI go beyond simply “generating slides” by helping it build clearer narratives, create more aesthetically pleasing designs, and iteratively improve through aesthetic-aware feedback. We also introduce the EvoPresent Benchmark to evaluate presentation generation quality and aesthetic awareness. 📍 Poster session: Apr 25, 2026, 11:15 AM–1:45 PM PDT 📌 Location: Pavilion 4, P4-#4611 Unfortunately, none of the authors will be able to attend in person this time @Toby_Yang_7 @xwang_lk. However, our group member @Ceeqnn will present on my behalf, so feel free to stop by and chat with her! 💬 If you are interested in AI agents and multimodal, we would also be happy to connect online!
Chengzhi Liu tweet media
English
0
3
18
2K
Qingni Wang retweetledi
Xin Eric Wang (hiring postdoc)
Computer-use agents are getting very capable. But capability is not the bottleneck anymore. 𝐑𝐞𝐥𝐢𝐚𝐛𝐢𝐥𝐢𝐭𝐲 is. Benchmarks reward “works once.” Real-world systems require “works every time.” In On the Reliability of Computer Use Agents, we study WHY this gap exists and HOW to close it. Thread 👇
Xin Eric Wang (hiring postdoc) tweet media
English
4
21
62
8.9K
Qingni Wang retweetledi
Qianqi "Jackie" Yan
Qianqi "Jackie" Yan@qianqi_yan·
🚀 Excited to share our new work: 𝗢𝗺𝗻𝗶𝗧𝗿𝗮𝗰𝗲: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs Multimodal LLMs can process text 📝, images 🖼️, audio 🎧, and video 🎬 together, but when they generate a response, 𝘄𝗵𝗶𝗰𝗵 𝗶𝗻𝗽𝘂𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘀𝘂𝗽𝗽𝗼𝗿𝘁𝗲𝗱 𝗲𝗮𝗰𝗵 𝗰𝗹𝗮𝗶𝗺? OmniTrace traces every generated span back to its multimodal sources 𝗱𝘂𝗿𝗶𝗻𝗴 𝗱𝗲𝗰𝗼𝗱𝗶𝗻𝗴 across text, image, audio, and video. No retraining needed. Fully plug-and-play. 🔌 📄 Paper: github.com/eric-ai-lab/Om… 💻 Code: github.com/eric-ai-lab/Om… 🌐 Project: jackie-2000.github.io/omnitrace.gith… 📦 pip install omnitrace 🧵👇
Qianqi "Jackie" Yan tweet media
English
1
6
17
2.2K
Xin Eric Wang (hiring postdoc)
𝐑𝐞𝐥𝐢𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐢𝐬 𝐭𝐡𝐞 𝐟𝐮𝐧𝐝𝐚𝐦𝐞𝐧𝐭𝐚𝐥 𝐛𝐨𝐭𝐭𝐥𝐞𝐧𝐞𝐜𝐤 𝐟𝐨𝐫 𝐆𝐔𝐈 𝐚𝐠𝐞𝐧𝐭𝐬.⚠️ One wrong click can trigger irreversible, costly actions 💥 Introducing 𝐒𝐚𝐟𝐞𝐆𝐫𝐨𝐮𝐧𝐝🛡️: an uncertainty-calibrated framework that knows when not to act, enabling risk-aware GUI grounding with statistical guarantees 📊 𝐊𝐞𝐲 𝐢𝐝𝐞𝐚: the real danger is 𝐬𝐢𝐥𝐞𝐧𝐭 𝐟𝐚𝐢𝐥𝐮𝐫𝐞 🤫 Most GUI grounding models always output a coordinate, even when they’re unsure ❌📍 Instead, SafeGround: 📐 𝘌𝘴𝘵𝘪𝘮𝘢𝘵𝘦𝘴 𝘴𝘱𝘢𝘵𝘪𝘢𝘭 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘵𝘺 𝘧𝘳𝘰𝘮 𝘱𝘳𝘦𝘥𝘪𝘤𝘵𝘪𝘰𝘯 𝘷𝘢𝘳𝘪𝘢𝘣𝘪𝘭𝘪𝘵𝘺; 🎯 𝘊𝘢𝘭𝘪𝘣𝘳𝘢𝘵𝘦𝘴 𝘢 𝘥𝘦𝘤𝘪𝘴𝘪𝘰𝘯 𝘵𝘩𝘳𝘦𝘴𝘩𝘰𝘭𝘥 𝘸𝘪𝘵𝘩 𝘴𝘵𝘢𝘵𝘪𝘴𝘵𝘪𝘤𝘢𝘭 𝘨𝘶𝘢𝘳𝘢𝘯𝘵𝘦𝘦𝘴; 🛑 𝘈𝘣𝘴𝘵𝘢𝘪𝘯𝘴 𝘰𝘳 𝘥𝘦𝘧𝘦𝘳𝘴 𝘩𝘪𝘨𝘩-𝘳𝘪𝘴𝘬 𝘢𝘤𝘵𝘪𝘰𝘯𝘴, 𝘦𝘯𝘢𝘣𝘭𝘪𝘯𝘨 𝘳𝘪𝘴𝘬-𝘤𝘰𝘯𝘵𝘳𝘰𝘭𝘭𝘦𝘥 𝘎𝘜𝘐 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯, 𝘦𝘷𝘦𝘯 𝘧𝘰𝘳 𝘣𝘭𝘢𝘤𝘬-𝘣𝘰𝘹 𝘮𝘰𝘥𝘦𝘭𝘴.🔒🤖
Xin Eric Wang (hiring postdoc) tweet media
Qingni Wang@Ceeqnn

🚨 New paper alert 🚨  📌 How can we make GUI grounding models reliable in real-world interactions?  We introduce 🚀 SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration In GUI agents, a single wrong click isn’t just an error — it can trigger costly or irreversible actions (e.g., unintended payments 💸 or deleting important files 🗑️).  The real danger is silent failure: most GUI grounding models always output a coordinate, even when they’re unsure.  Instead of trusting a single predicted point, SafeGround:  • estimates spatial uncertainty from prediction variability  • calibrates a decision threshold with statistical guarantees  • enables risk-controlled GUI actions, even with black-box models  💻 Code: github.com/Cece1031/SAFEG…  📄 Paper: arxiv.org/pdf/2602.02419 🧵1/6 #Agents #GUI

English
3
5
28
5.5K
Qingni Wang
Qingni Wang@Ceeqnn·
Huge thanks to my co-first author Yue Fan @YFan_UCSC and my advisor Prof. Xin Eric Wang @xwang_lk. Grateful for the guidance, support, and inspiring collaboration throughout this project! ✨
English
0
0
3
411
Qingni Wang
Qingni Wang@Ceeqnn·
🧵 6/6 📊 Visualizing the Trade-off: Safety vs. Efficiency This chart illustrates our Cascading Rate—how frequently we ask the expensive expert for help. The key takeaway? SafeGround is extremely selective. We can maintain superior system-level performance while keeping the cascading rate surprisingly low. Instead of routing every query to the cloud, we only escalate the truly hard cases. You get the reliability of an expert model, but at a fraction of the cost. 💰✅
Qingni Wang tweet media
English
1
0
4
510
Qingni Wang
Qingni Wang@Ceeqnn·
🧵5/6🚀 Boost Accuracy via Cascading Safety doesn't mean low performance. In fact, it unlocks high accuracy! With SafeGround acting as a gatekeeper, we enable Cascading Inference: ✅ Easy tasks: Handled by the fast, local model. ⚠️ Hard/Risky tasks: Detected by uncertainty and escalated to an expert model. The result? We achieve 58.66% accuracy on ScreenSpot-Pro, beating the Gemini-only baseline by +5.38%! 📈
Qingni Wang tweet media
English
1
0
3
307
Qingni Wang
Qingni Wang@Ceeqnn·
🚨 New paper alert 🚨  📌 How can we make GUI grounding models reliable in real-world interactions?  We introduce 🚀 SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration In GUI agents, a single wrong click isn’t just an error — it can trigger costly or irreversible actions (e.g., unintended payments 💸 or deleting important files 🗑️).  The real danger is silent failure: most GUI grounding models always output a coordinate, even when they’re unsure.  Instead of trusting a single predicted point, SafeGround:  • estimates spatial uncertainty from prediction variability  • calibrates a decision threshold with statistical guarantees  • enables risk-controlled GUI actions, even with black-box models  💻 Code: github.com/Cece1031/SAFEG…  📄 Paper: arxiv.org/pdf/2602.02419 🧵1/6 #Agents #GUI
Qingni Wang tweet media
English
1
3
13
6.6K
Qingni Wang
Qingni Wang@Ceeqnn·
🧵4/6 📉 Statistical Safety Guarantees  "Maybe it works" isn't good enough for high-stakes actions. We adopt the Learn Then Test (LTT) paradigm to calibrate our uncertainty threshold. 🎯 You set the risk level. 🛡️ We guarantee the control. SafeGround provides finite-sample guarantees on the False Discovery Rate (FDR). This ensures that among the actions the agent chooses to execute, the error rate stays strictly below your limit.
Qingni Wang tweet media
English
1
0
3
232
Qingni Wang
Qingni Wang@Ceeqnn·
🧵3/6 Uncertainty: It's all about Spatial Distribution 🗺️ Standard probability scores (logits) are often misleading in GUI tasks. We take a different approach. By sampling multiple outputs, we build a Spatial Density Map to measure uncertainty from 3 dimensions: 1️⃣ Ambiguity: Are there competing buttons? 2️⃣ Dispersion: Is the attention scattered?3️⃣ Concentration: Is there a clear focal point?  We combine these signals into a unified uncertainty score that captures the shape of the model’s confusion — and consistently improves AUROC / AUARC across models (Tables 2 & 3).
Qingni Wang tweet mediaQingni Wang tweet media
English
1
0
3
258
Qingni Wang
Qingni Wang@Ceeqnn·
🧵2/6 How SafeGround works  Given a GUI input, SafeGround runs multiple stochastic grounding passes and aggregates them into a spatial distribution over the screen.
From this distribution, we quantify uncertainty and calibrate a threshold on held-out data. At test time:
• low uncertainty → execute directly
• high uncertainty → abstain or cascade The key idea: use spatial uncertainty to decide when an action is safe to take.
Qingni Wang tweet media
English
1
0
3
294
Qingni Wang retweetledi
Xin Eric Wang (hiring postdoc)
🎉 Three papers are accepted to #ICLR2026! Huge congrats to our students and collaborators! 🔹 SAFER: Risk-Constrained Sample-then-Filter in LLMs, led by @Ceeqnn & @YFan_UCSC 🔹 Presenting a Paper is an Art, led by @liuchen02938149 & @Toby_Yang_7, in collaboration with @OrbyAI 🔹 PhyWorldBench, led by @jinggu4ai (now @xAI), in collaboration with @nvidia Proud of everyone involved—amazing work! 🚀
Xin Eric Wang (hiring postdoc) tweet media
English
0
8
92
5.9K
Qingni Wang
Qingni Wang@Ceeqnn·
Many thanks to my co-authors Yue Fan @YFan_UCSC and my intern advisor Prof. Xin Eric Wang @xwang_lk for their guidance, support, and inspiring collaboration.✨🥳
English
0
0
4
374
Qingni Wang
Qingni Wang@Ceeqnn·
📊 Evaluation Robustness (Rouge-L): Even when evaluated with semantic metrics like Rouge-L, SAFER sustains low empirical error rates, proving its robustness and general reliability across evaluation schemes. #AI #Evaluation #LLM 🧵5/5
Qingni Wang tweet media
English
1
1
5
574
Qingni Wang
Qingni Wang@Ceeqnn·
🚨 Building Reliable Open-Ended QA? Meet SAFER — Risk-Controlled Sample-then-Filter for LLMs! 🚨 SAFER introduces a two-stage framework that controls uncertainty and guarantees statistical reliability in open-ended question answering by dynamically calibrating sampling and filtering. It helps models know when to trust their answers — and when to abstain.  💡 Why SAFER matters: In real-world deployments, LLMs must respond responsibly — knowing when to act confidently and when to abstain or defer. SAFER ensures that, with provable miscoverage control and minimal sample waste, LLMs can provide trustworthy answers under risk constraints. #AI #LLM  📄 Paper: arxiv.org/2510.10193 💻 Code: github.com/Cece1031/SAFER 🧵1/5
Qingni Wang tweet media
English
3
6
52
11.4K