33 posts

xy

@xyliu_cs

PhD student in AI @cuhksz

Hong Kong Katılım Şubat 2022

125 Takip Edilen25 Takipçiler

xy retweetledi

Boxi Yu@BoshCavendish·3 Mar

OpenAI just confirmed (openai.com/index/why-we-n…): SWE-Bench Verified has flawed tests that reject correct solutions -- 59.4% of their audited 27.6% subset. Their recommendation: stop using Verified, switch to Pro. But is Pro safe? We tested it. SWE-ABS strengthens 64.7% of sampled 150 SWE-Bench Pro instances -- weak tests are not a Verified-only problem. Instead of abandoning SWE-Bench Verified, we fix the tests. SWE-ABS rejects 19.78% of "solved" patches from the top-30 agents as semantically wrong, leading to a 14.56% average resolved rate drop -- and all 30 agents' rankings change. Introducing SWE-ABS: adversarial benchmark strengthening for code-agent evaluation. Paper: arxiv.org/abs/2603.00520 Code: github.com/OpenAgentEval/… Data: huggingface.co/datasets/OpenA…

English

855

xy retweetledi

Daniel Kang@ddkang·22 Tem

SWE-bench Verified is the gold standard for evaluating coding agents: 500 real-world issues + tests by OpenAI. Sounds bullet-proof? Not quite. We show passing its unit tests != matching ground truth. In our ACL paper, we fixed buggy evals: 24% of agents moved up or down the leaderboard! 1/7

English

200

29K

xy retweetledi

Zhaopeng Tu@tuzhaopeng·5 Haz

Can image safeguards be bypassed by breaking harmful prompts into harmless steps? ⚠️ Introducing Chain-of-Jailbreak (CoJ) Attack, a novel method showing how image generation models (e.g., GPT-4V/o, Gemini 1.5) can be compromised by decomposing malicious queries into a step-by-step editing process. #ACL2025NLP 🔓 Our CoJ attack method bypasses safeguards by guiding models to iteratively edit images through a sequence of seemingly harmless sub-queries. This approach successfully jailbreaks advanced models in over 60% of cases. 📊 We built CoJ-Bench, a comprehensive dataset featuring 9 safety scenarios, 3 types of editing operations, and 3 editing elements, to systematically evaluate these vulnerabilities. 🛡️ We also propose 'Think-Twice Prompting,' a simple yet effective defense that prompts models to pre-visualize and describe images before generation, successfully defending against over 95% of CoJ attacks. 🧑‍💻 Project: github.com/Jarviswang94/C… 📃 Paper: arxiv.org/abs/2410.03869

English

3.4K

xy retweetledi

Zhaopeng Tu@tuzhaopeng·4 Haz

Can MLLMs truly "see" safety risks in image-text combinations? 🌲🖼️ Introducing MMSafetyAwareness, the first comprehensive benchmark for multimodal safety awareness in MLLMs, featuring 1,500 image-prompt pairs across 29 safety scenarios to evaluate whether models correctly identify unsafe content and avoid over-sensitivity. #ACL2025NLP 🔍 Key findings: ❌ Major safety gaps: Current MLLMs (e.g., GPT-4V) misclassify 36.1% of unsafe inputs as safe. 🚧 Harmful over-sensitivity: Models like GPT-4V flag 59.9% of benign inputs as unsafe, undermining helpfulness. 🚨 Unmet challenges: 3 tested methods (prompting, visual contrastive decoding, vision-centric fine-tuning) failed to resolve these issues, highlighting profound unsolved risks. 🧑‍💻 Project: github.com/Jarviswang94/M… 📃 Paper: arxiv.org/abs/2502.11184

English

6.4K

xy retweetledi

Zhaopeng Tu@tuzhaopeng·3 Haz

When eyes and memory clash, who wins? 👁️🧠 Introducing a comprehensive study on vision-knowledge conflicts in MLLMs, where visual input contradicts the model's internal commonsense knowledge—and the results might surprise you. #ACL2025NLP 📈 We developed an automated framework to generate ConflictVis benchmark: 374 original images with 1,122 QA pairs designed to test when MLLMs see one thing but "know" another. 📊 Shocking findings across 9 leading MLLMs: 1⃣ ~20% over-reliance on parametric knowledge over visual evidence 2⃣ Yes-No questions show 43.6% memorization bias (Claude-3.5-Sonnet) 3️⃣ Action-related conflicts are 10.4% more problematic than place conflicts 👀 We propose "Focus-on-Vision" prompting strategy that significantly improves performance by instructing models to prioritize what they see over what they remember. Despite improvements, vision-knowledge conflicts remain a persistent challenge for multimodal AI systems. 📃 Paper: arxiv.org/abs/2410.08145

English

2.4K

xy retweetledi

Zhaopeng Tu@tuzhaopeng·20 May

Trust your AI, but can it trust itself? 🤔 Introducing an online reinforcement learning framework, RISE (Reinforcing Reasoning with Self-Verification), enabling LLMs to simultaneously level-up BOTH their problem-solving AND self-checking skills! 🧐 Problems tackled: ✅ "Superficial self-reflection" — models failing to verify their own reasoning robustly. ✅ Separation between reasoning and self-verification training. 🚀 RISE empowers models to critique their OWN reasoning via on-the-fly feedback and verifiable rewards, promoting stronger, more dynamic reasoning loops and effective self-assessment skills. 📊 Key results: 📈 Up to 2.8× better self-verification accuracy on challenging math tasks. 📈 Outperforms instruction-tuned models (Qwen2.5): +3.7% in reasoning, +33.4% in verification accuracy. 📈 Better internal reasoning: frequent, more accurate verification behaviors. 🧑‍💻 Code: github.com/xyliu-cs/RISE 📃 Paper: arxiv.org/abs/2505.13445

English

138

26.8K

xy@xyliu_cs·8 Şub

@HumataAI I really enjoyed your product. Could I be moved up from the waitlist :)

English

171

Humata@HumataAI·3 Şub

An AI research assistant where you can ask a question about any file and automatically get the answer. #ChatGPT for your files. Here I use it on a research paper produced by my old lab at Stanford. #GPT3 #AI @LongFormMath @hubermanlab @lexfridman

English

147

35K

xy@xyliu_cs·1 Eyl

@ValaAfshar @get_this_v

QAM

Vala Afshar@ValaAfshar·1 Eyl

perspective

English

1.8K

10.8K

xy retweetledi

Science girl@sciencegirl·9 Tem

Newton's first law of motion, an object remains in the same state of motion unless acted upon by a force Huge winds act on this waters natural propensity to follow gravity The magnitude of the two forces are equal, and their directions are opposite,