xy

33 posts

xy banner
xy

xy

@xyliu_cs

PhD student in AI @cuhksz

Hong Kong Katılım Şubat 2022
125 Takip Edilen25 Takipçiler
xy retweetledi
Boxi Yu
Boxi Yu@BoshCavendish·
OpenAI just confirmed (openai.com/index/why-we-n…): SWE-Bench Verified has flawed tests that reject correct solutions -- 59.4% of their audited 27.6% subset. Their recommendation: stop using Verified, switch to Pro. But is Pro safe? We tested it. SWE-ABS strengthens 64.7% of sampled 150 SWE-Bench Pro instances -- weak tests are not a Verified-only problem. Instead of abandoning SWE-Bench Verified, we fix the tests. SWE-ABS rejects 19.78% of "solved" patches from the top-30 agents as semantically wrong, leading to a 14.56% average resolved rate drop -- and all 30 agents' rankings change. Introducing SWE-ABS: adversarial benchmark strengthening for code-agent evaluation. Paper: arxiv.org/abs/2603.00520 Code: github.com/OpenAgentEval/… Data: huggingface.co/datasets/OpenA…
Boxi Yu tweet media
English
2
6
12
855
xy retweetledi
Daniel Kang
Daniel Kang@ddkang·
SWE-bench Verified is the gold standard for evaluating coding agents: 500 real-world issues + tests by OpenAI. Sounds bullet-proof? Not quite. We show passing its unit tests != matching ground truth. In our ACL paper, we fixed buggy evals: 24% of agents moved up or down the leaderboard! 1/7
Daniel Kang tweet media
English
11
34
200
29K
xy retweetledi
Zhaopeng Tu
Zhaopeng Tu@tuzhaopeng·
Can image safeguards be bypassed by breaking harmful prompts into harmless steps? ⚠️ Introducing Chain-of-Jailbreak (CoJ) Attack, a novel method showing how image generation models (e.g., GPT-4V/o, Gemini 1.5) can be compromised by decomposing malicious queries into a step-by-step editing process. #ACL2025NLP 🔓 Our CoJ attack method bypasses safeguards by guiding models to iteratively edit images through a sequence of seemingly harmless sub-queries. This approach successfully jailbreaks advanced models in over 60% of cases. 📊 We built CoJ-Bench, a comprehensive dataset featuring 9 safety scenarios, 3 types of editing operations, and 3 editing elements, to systematically evaluate these vulnerabilities. 🛡️ We also propose 'Think-Twice Prompting,' a simple yet effective defense that prompts models to pre-visualize and describe images before generation, successfully defending against over 95% of CoJ attacks. 🧑‍💻 Project: github.com/Jarviswang94/C… 📃 Paper: arxiv.org/abs/2410.03869
Zhaopeng Tu tweet media
English
2
8
37
3.4K
xy retweetledi
Zhaopeng Tu
Zhaopeng Tu@tuzhaopeng·
Can MLLMs truly "see" safety risks in image-text combinations? 🌲🖼️ Introducing MMSafetyAwareness, the first comprehensive benchmark for multimodal safety awareness in MLLMs, featuring 1,500 image-prompt pairs across 29 safety scenarios to evaluate whether models correctly identify unsafe content and avoid over-sensitivity. #ACL2025NLP 🔍 Key findings: ❌ Major safety gaps: Current MLLMs (e.g., GPT-4V) misclassify 36.1% of unsafe inputs as safe. 🚧 Harmful over-sensitivity: Models like GPT-4V flag 59.9% of benign inputs as unsafe, undermining helpfulness. 🚨 Unmet challenges: 3 tested methods (prompting, visual contrastive decoding, vision-centric fine-tuning) failed to resolve these issues, highlighting profound unsolved risks. 🧑‍💻 Project: github.com/Jarviswang94/M… 📃 Paper: arxiv.org/abs/2502.11184
Zhaopeng Tu tweet media
English
1
10
49
6.4K
xy retweetledi
Zhaopeng Tu
Zhaopeng Tu@tuzhaopeng·
When eyes and memory clash, who wins? 👁️🧠 Introducing a comprehensive study on vision-knowledge conflicts in MLLMs, where visual input contradicts the model's internal commonsense knowledge—and the results might surprise you. #ACL2025NLP 📈 We developed an automated framework to generate ConflictVis benchmark: 374 original images with 1,122 QA pairs designed to test when MLLMs see one thing but "know" another. 📊 Shocking findings across 9 leading MLLMs: 1⃣ ~20% over-reliance on parametric knowledge over visual evidence 2⃣ Yes-No questions show 43.6% memorization bias (Claude-3.5-Sonnet) 3️⃣ Action-related conflicts are 10.4% more problematic than place conflicts 👀 We propose "Focus-on-Vision" prompting strategy that significantly improves performance by instructing models to prioritize what they see over what they remember. Despite improvements, vision-knowledge conflicts remain a persistent challenge for multimodal AI systems. 📃 Paper: arxiv.org/abs/2410.08145
Zhaopeng Tu tweet media
English
0
7
27
2.4K
xy retweetledi
Zhaopeng Tu
Zhaopeng Tu@tuzhaopeng·
Trust your AI, but can it trust itself? 🤔 Introducing an online reinforcement learning framework, RISE (Reinforcing Reasoning with Self-Verification), enabling LLMs to simultaneously level-up BOTH their problem-solving AND self-checking skills! 🧐 Problems tackled: ✅ "Superficial self-reflection" — models failing to verify their own reasoning robustly. ✅ Separation between reasoning and self-verification training. 🚀 RISE empowers models to critique their OWN reasoning via on-the-fly feedback and verifiable rewards, promoting stronger, more dynamic reasoning loops and effective self-assessment skills. 📊 Key results: 📈 Up to 2.8× better self-verification accuracy on challenging math tasks. 📈 Outperforms instruction-tuned models (Qwen2.5): +3.7% in reasoning, +33.4% in verification accuracy. 📈 Better internal reasoning: frequent, more accurate verification behaviors. 🧑‍💻 Code: github.com/xyliu-cs/RISE 📃 Paper: arxiv.org/abs/2505.13445
Zhaopeng Tu tweet media
English
0
36
138
26.8K
xy
xy@xyliu_cs·
@HumataAI I really enjoyed your product. Could I be moved up from the waitlist :)
English
0
0
0
171
Vala Afshar
Vala Afshar@ValaAfshar·
perspective
English
65
1.8K
10.8K
0
xy retweetledi
Science girl
Science girl@sciencegirl·
Newton's first law of motion, an object remains in the same state of motion unless acted upon by a force Huge winds act on this waters natural propensity to follow gravity The magnitude of the two forces are equal, and their directions are opposite,
English
272
9.4K
53.5K
0