Keping Bi

20 posts

Keping Bi

@bikeping

Katılım Mayıs 2010

17 Takip Edilen20 Takipçiler

Keping Bi retweetledi

Shiyu Ni@Shictyu·9 Nis

Excited to share our paper "How Long Reasoning Chains Influence LLMs’ Judgment of Answer Factuality" \w @bikeping got accepted to ACL 2026, with an Oral recommendation from the Senior Area Chair! Paper: arxiv.org/pdf/2604.06756 Code: github.com/Trustworthy-In… 🎉 Here's what we did 🧵 💡Motivation LLM-as-a-Judge often fails because judges don't know the correct answer — and have no extra information to reference. Can the reasoning trace serves as additional evidence that help the judges to judge more accurately? 📖Concrete example: Q: Who was the first Nobel Physics laureate? A: Einstein The judge doesn't know if that's right. But the reasoning says "Einstein won his first Nobel in 1921" — while the first prize was awarded in 1901. Caught! 🙅 Sounds great… but is it really that simple? 🔦 What we did? TL;DR: Reasoning traces are a double-edged sword. LLM judges still can't consistently distinguish "actually correct" from "sounds correct." We studied this across 4 datasets × 10+ judge models (GPT-4o, Claude Sonnet 4.5, DeepSeek-v3.1…). Two key findings: ❌ Weak judges are almost completely fooled. In NQ, only 23.2% of answers are correct — but weak models accept up to 88% when reasoning looks fluent. They judge style, not substance. ✅ Strong judges are smarter, but not perfect. DeepSeek-v3.1's alignment improves from 63.4% → 76.2% on NQ. But even strong judges get misled by high-quality reasoning chains. Just like humans: non-experts get sweet-talked, experts push back 😄 Controlled experiments on reasoning chain features: 1. Fluency is the first gate: break the reasoning flow, and most models mark it “incorrect,” even if it’s right. 2. Factuality is important: counterfactuals reduce pass rates, but adding more errors doesn’t increase sensitivity—the evaluator isn’t counting them. 3. Position matters: errors at the start hurt most; errors at the end matter less.

English

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·12 Ara

SIGIR-AP2025 has successfully concluded! Hope that everyone has a safe trip back :) Here is a summary of the event: mp.weixin.qq.com/s/aYKHW54wj6aZ… Looking forward to seeing you next year!

English

770

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·18 Kas

The full SIGIR-AP 2025 program schedule is now available! Check it out here: sigir-ap.org/sigir-ap-2025/… #SIGIRAP2025

English

345

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·15 Eki

🌏 Visa Invitation for SIGIR-AP 2025 If you need a Chinese visa to attend SIGIR-AP, please send your full name, gender, date of birth, passport number, and institution to registration2025@sigir-ap.org. We’ll issue your official invitation letter as soon as possible. #SIGIRAP2025

English

239

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·13 Eki

Don't miss out! Register for SIGIR-AP 2025 by October 19th to lock in your early-bird discount. sigir-ap.org/sigir-ap-2025/… #SIGIRAP2025 #EarlyBird

English

244

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·3 Eyl

China has recently introduced a trial policy allowing Russian citizens to enter visa-free for up to 30 days, from Sep. 15, 2025, to Sep. 14, 2026. We warmly welcome Russian researchers and students to join us at SIGIR-AP in Xi'an! Check this out: sigir-ap.org/sigir-ap-2025/….

English

404

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·21 Ağu

Join us at SIGIR-AP for two exciting workshops: R3AG 2025: The Second Workshop on Refined and Reliable Retrieval-Augmented Generation, and BREV-RAG: Beyond Relevance-based EValuation of RAG systems. Submit your work by September 30! Learn more at sigir-ap.org/sigir-ap-2025/….

English

828

Keping Bi retweetledi

Shiyu Ni@Shictyu·30 Tem

Our paper "Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception" will be presented on July 30th from 11:00 to 12:30 at Hall 5X, #195. Welcome to drop by and have a discussion! #ACL2025NLP

English

195

Keping Bi retweetledi

Shiyu Ni@Shictyu·16 Tem

🥳Happy to share that our paper "Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception" has been accepted by #ACL2025! We explore leveraging LLMs' internal states to improve their knowledge boundary perception from efficiency and risk perspectives.

English

1.7K

Keping Bi@bikeping·4 Tem

Papers unaccepted by ICTIR are also encouraged to submit to the SIGIR Revise-and-Resubmit (R&R) track :)

SIGIR-AP 2025@ACMSIGIR_AP

#SIGIRAP2025 accepts "SIGIR-Revise-and-Resubmit" submissions! Learn more here: sigir-ap.org/sigir-ap-2025/…. ⏰Deadline: July 15

English

506

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·2 Tem

🚨 Deadline Extended! 🚨 The #SIGIRAP2025 submission deadline is extended to July 15. You now have two more weeks to finalize your work and submit it!

English

838

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·20 Haz

If you have concerns about obtaining a VISA to China to attend #SIGIRAP2025, please note that there are now multiple visa-free routes for many countries, and the standard F/L visa process remains straightforward. Please check the VISA information: sigir-ap.org/sigir-ap-2025/…

English

167

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·20 Haz

We launched a webpage about visiting Xi'an: sigir-ap.org/sigir-ap-2025/…. Xi'an is one of China's Four Great Ancient Capitals with a rich history spanning over 3,000 years. It has been the capital of 13 dynasties. Welcome your submissions, and looking forward to seeing you in Xi'an!

English

1.3K

Keping Bi retweetledi

Yuchen Wen@YuchenWen1027·19 Haz

😎Our paper “Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective” is accepted to #acl2025 w/@bikeping etc. We propose a psychometric-inspired framework to induce and evaluate implicit bias in LLMs. Project webpage:yuchenwen1.github.io/ImplicitBiasEv…

English

226

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·18 Haz

#SIGIRAP2025 calls for workshop proposals. Deadline: July 1, 2025 Details: sigir-ap.org/sigir-ap-2025/…

English

252

Keping Bi retweetledi

SIGIR-AP 2025@ACMSIGIR_AP·11 Nis

The official website for SIGIR-AP 2025 is now live! Please visit: sigir-ap.org/sigir-ap-2025. This year, we are also inviting industry papers. We also encourage authors of unsuccessful SIGIR submissions to consider submitting to SIGIR-AP. We look forward to seeing you in Xi'an!

English

4.2K

Keping Bi retweetledi

Wanqing Cui@WanqingCui·4 Haz

code: github.com/VickiCui/MORE paper: arxiv.org/pdf/2402.13625

Français

114

Keping Bi retweetledi

Wanqing Cui@WanqingCui·4 Haz

Our paper "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning", got accepted by #acl2024 w/ @bikeping , etc. We propose a novel retrieval augmentation framework to leverage both text and images to enhance the commonsense ability of language models.

English

174

Keping Bi retweetledi

Shiyu Ni@Shictyu·4 Haz

@bikeping Paper link: arxiv.org/pdf/2402.11457 Code link: github.com/ShiyuNee/When-…

English

180

Keping Bi retweetledi

Shiyu Ni@Shictyu·4 Haz

Our paper, "When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation", got accepted by #acl2024 w/@bikeping, etc. We explore effective and efficient adaptive RAG by enhancing LLMs' perception of their knowledge boundaries.

English

1.6K

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry