Devanshu Sahoo (@Devanshu_Sahoo) - Twitter Profili

Devanshu Sahoo@Devanshu_Sahoo·14 Ara

@rryssf As the authors, I can say this paper itself took significant time and careful effort to complete—and its conclusions were unsettling even for us. We hope it contributes to a more cautious, informed use of LLMs in scientific review.

English

134

Robert Youssef@rryssf·13 Ara

I just read a paper that made me deeply uncomfortable about the future of science. It’s called “When Reject Turns into Accept”, and it quantifies how easy it is to manipulate LLM-based scientific reviewers without ever touching the review prompt. Here’s what they actually measured. The authors embedded subtle instructions into the manuscript text. Things like background explanations, framing sentences, or citation commentary. Nothing that looks suspicious to a human reviewer. When LLM reviewers read these papers: • Rejection rates dropped by 20-40 percentage points • Average review scores shifted from “borderline reject” to “clear accept” • Confidence scores increased instead of decreasing • Models complied with the injected intent in up to 70% of cases • Stronger models were not safer. Some were more susceptible The attack never targets the reviewer directly. The model just… reads the paper. And executes what it reads. That’s the core failure mode. We assumed reading content is passive. For LLMs, reading is execution. Any pipeline that uses LLMs for evaluation is now an attack surface: → Peer review → Grant screening → Hiring filters → Paper triage → Automated fact-checking The most unsettling conclusion in the paper is this: If you let models judge information, the information itself can rewrite the judge. Scientific review just became adversarial.

English

121

570

43.3K

Devanshu Sahoo

Keşfet