Chaowei Xiao

304 posts

Chaowei Xiao

Chaowei Xiao

@ChaoweiX

Assistant Professor @Johns Hopkins University Researcher@NVIDIA| Researcher on AI Safety/Security

Katılım Ekim 2020
578 Takip Edilen2.3K Takipçiler
Chaowei Xiao
Chaowei Xiao@ChaoweiX·
Happy to introduce: SeClaw (safo-lab.github.io/seclaw/) — 𝘁𝗵𝗲 secure Claw against diverse AI security threats. SeClaw designs the system-level defense solution to build secure-version claw inspired by our NeurIPS paper (DRIFT:arxiv.org/abs/2506.12104). Core security capabilities 🧱 #Agent Execution Isolation: SeClaw keeps the project on the host and only runs agent operations through mapped execution in Docker. ♻️ #Snapshot & #Rollback: SeClaw supports an efficient CoW rollback mechanism that can quickly snapshot and restore mounted host/container files. You can quickly restore to a known-good state after any risky operations. Let your agent run free! 🛡️ #PromptInjection Defense (System + Model Levels): SeClaw enforces Control-Flow Integrity (CFI) and Information-Flow Integrity (IFI) at the system level to constrain the agent’s valid action space and block unsafe decision paths. At the model level, SeClaw uses a guard model to sanitize suspicious tool outputs 🔍 #Skill #Audit: Scans skills for dangerous patterns (prompt injection, exfiltration, and destructive commands). 🧠 Memory Audit: Scans memory files for stored prompt-injection payloads, credentials, and PII leakage risks. 📜 Execution Audit: Records full task traces and reports potentially risky actions after each task completion. 🔐 Privacy Protection: SeClaw monitors potential privacy leaks during agent execution, including identity information, API keys, SSH keys, and other sensitive credentials. Suspicious exposures are detected and flagged. ⚠️ Risky Operation Protection: SeClaw detects potentially dangerous commands (e.g., rm -rf, sudo, or destructive system modifications). When such operations are triggered, SeClaw requires explicit user confirmation before execution, reducing the risk of unintended damage caused by agent tool misuse. 📡 Secure Communication Isolation: SeClaw isolates communication channels by maintaining separate context windows for each interaction source. This prevents cross-channel prompt injection and ensures that messages from one channel cannot manipulate the agent’s behavior in another. 🌐 Network Security Controls: SeClaw provides secure network communication through HTTPS enforcement, request timeout protection, and configurable network modes for agent execution environments, reducing the risk of network-based attacks and uncontrolled external access. #Agent #Security #Openclaw #Safety Github: github.com/SaFo-Lab/seclaw website: safo-lab.github.io/seclaw/
English
1
13
20
1.5K
Chaowei Xiao retweetledi
Starc
Starc@Starc_Institute·
Today’s breakdown of a paper worth thinking about: Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis arxiv.org/abs/2512.00966 Paper recap: 🌟 Studies indirect prompt injection attacks (IPIAs) in agentic pipelines where untrusted tool or environment data contains hidden instructions.2512.00966v1 🌟 Proposes IntentGuard, a defense based on instruction-following intent analysis rather than surface-level instruction detection. 🌟 Introduces an Instruction-Following Intent Analyzer (IIA) that extracts the set of instructions a reasoning-enabled LLM intends to follow. 🌟 Implements IIA via three thinking interventions: start-of-thinking prefilling, end-of-thinking refinement, and adversarial in-context demonstration. 🌟 Uses origin tracing with sliding-window embedding matching to determine whether intended instructions originate from trusted or untrusted segments. 🌟 If an intended instruction overlaps with untrusted data, the system either alerts the user or sanitizes and regenerates. 🌟 Evaluated on AgentDojo and Mind2Web with Qwen3-32B and gpt-oss-20B, reporting large ASR reductions under adaptive attacks with minimal utility loss. Discussion threads (auto processing from live discussion transcripts) 🫧 The discussion began by emphasizing the paper’s core framing: prompt injection defense should not focus on detecting malicious-looking text, but on whether the model intends to follow instructions originating from untrusted data. The presenter contrasted this with naïve detectors that flag instruction-like strings even when the model would ignore them, such as homework instructions inside an email that are irrelevant to the user’s task. 🫧 Several questions focused on how intent is extracted and traced. The group walked through the pipeline where all intended instructions are first collected into an instruction set, then traced back to their origin using a sliding-window similarity search. If an instruction is traced to untrusted data, the system either enters alert mode (asking for user confirmation) or recovery mode (removing the instruction from the instruction set rather than masking raw text). The transcript notes that the paper is somewhat unclear about this distinction, especially around what is removed and where. 🫧 A long discussion examined edge cases where the same instruction appears in both trusted and untrusted contexts. For example, if the user asks for a meeting overview and an email also contains the same request, participants questioned whether IntentGuard might accidentally sanitize the legitimate instruction. The presenter suggested that, in such cases, the system would likely require user intervention, and that behavior depends on whether the model decides to enter alert mode or recovery mode. The discussion did not fully resolve this ambiguity. 🫧 There was also sustained skepticism about component importance and ablation coverage. While the paper provides ablations for the intent-instruction stage (start-of-thinking and end-of-thinking), multiple participants noted that other components—such as intent detection quality, trusted/untrusted segmentation, and origin tracing assumptions—were not separately ablated. This made it difficult to assess which parts of the system are truly essential. 🫧 Another recurring concern was evaluation focus. The group noted that, despite a strong conceptual story, most results ultimately reduce to ASR numbers. Questions were raised about whether the experiments truly validate the intent-based framing, or whether they mainly demonstrate robustness under specific attack setups without clearly isolating failure cases of competing methods. One comment explicitly stated that the paper “tells a beautiful story,” but the experiments do not fully support all of its claims. Thank you for the wonderful work: Mintong Kang, Chong Xiang, @sanjayatwork, @ChaoweiX, @uiuc_aisecure, Edward Suh
English
4
1
6
532
Chaowei Xiao retweetledi
Marco Pavone
Marco Pavone@drmapavone·
The Autonomous Vehicle (AV) Research Group @NVIDIAAI is looking for talented interns! Dive into cutting-edge work—from reasoning models and generative simulation to AI safety—and help shape the future of AV and embodied AI. Ready to push the limits? Apply now: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx…
English
2
14
62
9K
Kangwook Lee
Kangwook Lee@Kangwook_Lee·
Happy to share that I got tenured last month! While every phase in life is special, this one feels a bit more meaningful, and it made me reflect on the past 15+ years in academia. I'd like to thank @UWMadison and @UWMadisonECE for tremendous support throughout the past six years, helping me grow. I am very grateful to all the teachers I’ve met in the past 15+ years of research since undergrad. Prof. Sae-Young Chung introduced me to engineering, and in particular, information theory. Prof. Yung Yi and Prof. Song Chong introduced me to communication network theory, and from Prof. Yung Yi I learned the true passion for research. I miss him a lot. At Berkeley, I learned everything about research from my advisor Prof. Kannan Ramchandran. In particular, I learned that the most important motivation behind great research is endless curiosity and the desire to really understand how things work. From my postdoc mentor Prof. Changho Suh at KAIST, I learned the mindset of perfection, making every single paper count. During my assistant professorship, I was lucky to have the best colleagues. I learned so much from Rob (@rdnowak) and Dimitris (@DimitrisPapail). I am still learning from Dimitris' unique sense of research taste and Rob's example of how to live as the coolest senior professor. I also learned a lot from the Optibeer folks Steve Wright, Jeff Linderoth, and my ECE colleagues Ramya (@ramyavinayak) and Grigoris (@Grigoris_c). Thank you all! I’d like to thank my former students and postdocs too. Daewon and Jy-yong (@jysohn1108) joined my lab early on and worked on many interesting projects. Changhun and Tuan (@tuanqdinh) joined midway through his PhD and worked on interesting research projects, and in particular, Tuan initiated our lab’s first LLM research five years ago! Yuchen (@yzeng58), Ziqian (@myhakureimu), and Ying (@yingfan_bot) joined around the same time, and working with them has been the most fun and rewarding part of my job. Each took on a challenging topic and did great work. Yuchen advanced LLM fine-tuning, especially parameter-efficient methods. Ziqian resolved the mystery of LLM in-context learning. Ying explored "a model in a loop," focusing on diffusion models and looped Transformers. They all graduated earlier this year and are continuing their research at @MSFTResearch and @Google. Best wishes! 🥰 I am also grateful for co-advising Nayoung (@nayoung_nylee), Liu (@Yang_Liuu), and Joe (@shenouda_joe) with Dimitris and/or Rob. Nayoung's work on Transformer length generalization, Liu's on in-context learning, and Joe's on the mathematical theory of vector-valued neural networks are all very exciting. They are all graduating very soon, so stay tuned! (And reach out to them if you have great opportunities!) I also had the pleasure of working with master's students Ruisu, Andrew, Jackson (@kunde_jackson), Bryce (@BryceYicongChen), and Michael (@michaelgira23), as well as many visiting students and researchers. Thank you for being such great collaborators. I’d like to thank and introduce the new(ish) members too. Jungtaek (@jungtaek_kim) and Thomas are studying LLM reasoning. Jongwon (@jongwonjeong123) just joined, and interestingly he was an MS student in Prof. Chung’s lab at KAIST, which makes him my academic brother turned academic son. Ethan (@ethan_ewer), Lynnix, and Chungpa (visiting) are also working on cool LLM projects! Thank you to @NSF, @amazon, @WARF_News, @FuriosaAI, @kseayg, and KFAS for generous funding. I also learned a lot from leading and working with the AI team at @Krafton_AI, particularly with Jaewoong @jaewoong_cho, so thank you for that as well. Last and most importantly, thanks to my family! ❤️ I only listed my mentors and mentees here, not all my amazing collaborators, but thank you all for the great work together. With that, I’m excited for what’s ahead, and so far no "tenure blues." Things look the same, if not more exciting... haha!
English
63
6
297
24.1K
Chaowei Xiao retweetledi
Huan Sun
Huan Sun@hhsun1·
Important that @AnthropicAI is considering new attacks specific to the browser, such as "hidden malicious form fields in a webpage’s Document Object Model (DOM) invisible to humans", which is exactly what our earlier work EIA (Environmental Injection Attack) focuses on, led by @LiaoZeyi and @LingboMo at @osunlp, accepted to #ICLR2025. We gave a specific example of "hidden malicious form fields invisible to humans" in Figure 1: arxiv.org/abs/2409.11295
Huan Sun tweet media
Anthropic@AnthropicAI

We’ve developed Claude for Chrome, where Claude works directly in your browser and takes actions on your behalf. We’re releasing it at first as a research preview to 1,000 users, so we can gather real-world insights on how it’s used.

English
0
6
30
5.3K
Chaowei Xiao retweetledi
Jianwei Yang
Jianwei Yang@jw2yang4ai·
Life Update: Now that I have finished the presentation of last @MSFTResearch project Magma at @CVPR, I am excited to share that I have joined @AIatMeta as a research scientist to further push forward the boundary of multimodal foundation models! I have always been passionate about building multimodal AI systems that can interact with human and environments. In the past five years, I am very fortunate to lead and contribute to a number of exciting projects on (a) vision and multimodal foundation: Focal Attention, FocalNet, UniCL, RegionCLIP, GLIP, Florence;  (b) generalist multimodal vision models: X-Decoder, SEEM, Semantic-SAM and Grounding-DINO; (c) multimodal large language models: LLaVA variants, Phi-3-Vision; (d) multimodal agentic model: SoM + OmniParser, TraceVLA, LAPA and Magma. I’m also very proud to have contributed to impactful projects, such as LLaVA-Med, GigaPath and BiomedParse, advancing AI for healthcare and human good. These are truly meaningful footnotes in my journey at MSR! In the past five years while I am staying at MSR, the world has witnessed tremendous breakthroughs in AI as well as those brought by AI. Looking ahead, the opportunity to advance AI research for a better world has never been so exciting. I feel so lucky to be part of this. Now right after five years, it feels like the right time for me to "graduate" from MSR and embrace new challenges beyond! Thank you all again for the support, mentorship, and friendship!
Jianwei Yang tweet media
English
52
6
384
30K
Chaowei Xiao
Chaowei Xiao@ChaoweiX·
I will be at CVPR from 10-12 and introduce our recent work on AI safety/security at Robust Foundation Model workshop cvpr24-advml.github.io. Please feel free to reach out if you are interested in safey/security topic
English
0
0
4
758
Chaowei Xiao
Chaowei Xiao@ChaoweiX·
Access control is a key concept for the computer security domain to ensures only authorized users can access sensitive assets. In our ACL paper, we applied this classic security concept to the large language models domain for safety. #safety #LLM #acl2025
Qin Liu@QinLiu_NLP

🚨 New paper accepted to #ACL2025! We propose SudoLM, a framework that lets LLMs learn access control over parametric knowledge. Rather than blocking everyone from sensitive knowledge, SudoLM grants access to authorized users only. Paper: arxiv.org/abs/2410.14676… 🧵[1/6]👇

English
0
0
6
1.2K
Chaowei Xiao retweetledi
Fei Wang
Fei Wang@fwang_nlp·
🎉 Excited to share that our paper, "MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding", will be presented at #ICLR2025!​ 📅 Date: April 24 🕒 Time: 3:00 PM 📍 Location: Hall 3 + Hall 2B #11 MuirBench challenges multimodal LLMs with diverse multi-image tasks, highlighting the need for models that can reason beyond single images. 🔗 Learn more: muirbench.github.io 🖼️ Poster: iclr.cc/media/PosterPD… #MuirBench #MultimodalAI #ICLR2025
Fei Wang tweet media
English
0
16
54
3.7K
Soheil Feizi
Soheil Feizi@FeiziSoheil·
Wow, I am speechless and deeply honored to receive the Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the U.S. government on outstanding scientists and engineers early in their careers. I’m grateful for the recognition of our work in Reliable AI and to my amazing students, colleagues, and mentors who made this possible. #PECASE #AI Read more: whitehouse.gov/ostp/news-upda…
English
46
6
299
17.2K