Gabrielle Kaili-May Liu

59

Yisong Miao @ EMNLP Suzhou@YisongMiao·3 Kas

@pybeebee Warm welcome! I will try to attend your poster.

English

0

1

93

Gabrielle Kaili-May Liu@pybeebee·3 Kas

Traveling to #EMNLP2025 to present our work “MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs” this week! 🇨🇳 Come by & let's chat about LLM faithfulness / uncertainty / calibration! 📍Poster Session 7 @ Hall C3 🗓Fri, Nov 7 @ 2-3:30p 🔗tinyurl.com/metafaith

🎉 Delighted to announce that MetaFaith has been accepted to #EMNLP2025 Main! In this work we systematically study how well LLMs can express their internal uncertainty in words, offering a metacognition-inspired way to improve this ability 🧠✨ Check out more details below!👇

English

10

5.3K

Gabrielle Kaili-May Liu@pybeebee·30 Eki

(12/n) 🙏 Lastly a big shoutout and thanks to the co-authors of this work, @bryanlics, @armancohan, Will Walden, and @EYangTW!

English

64

Gabrielle Kaili-May Liu@pybeebee·30 Eki

(11/n) If you're interested in RAG, multi-hop reasoning, or evaluation of LLMs — give our paper a read and check out CRUMQs! 📄 Paper: arxiv.org/abs/2510.11956 🔗 Code: github.com/pybeebee/CRUMQs

English

92

Gabrielle Kaili-May Liu@pybeebee·30 Eki

🚀 RAG systems excel at answering questions—but what happens when the corpus has NO answer or complex multi-hop reasoning is required? Moreover, how can we build benchmarks to stress-test RAG systems in such settings in a realistic way? See our new preprint to find out! 🧵👇

English

3

5

709

Gabrielle Kaili-May Liu@pybeebee·3 Eki

Excited to present this at #EMNLP2025 in just over a month! It turns out that even flagship models like GPT-5 still struggle at faithfully expressing uncertainty 🤔 📊 Full results for the newest models are now live👇 arxiv.org/abs/2505.24858

🎉 Delighted to announce that MetaFaith has been accepted to #EMNLP2025 Main! In this work we systematically study how well LLMs can express their internal uncertainty in words, offering a metacognition-inspired way to improve this ability 🧠✨ Check out more details below!👇

English

1

6

470

Gabrielle Kaili-May Liu รีทวีตแล้ว

Alan Li@alanli2020·28 Ağu

1/9 🚀 New paper: Demystifying Scientific Problem-Solving in LLMs — How does reasoning enhancement affect knowledge recall, and do LLMs benefit from external knowledge complimentary to reasoning? Tldr; 📊 SciReas: holistic and efficient evaluation suite for scientific reasoning 🧠 KRUX: a novel framework to study knowledge vs reasoning in LLMs 🔑 Findings: knowledge is a bottleneck; reasoners + in-context knowledge help; long CoT helps knowledge recall/utilization

English

2

16

4.6K

Gabrielle Kaili-May Liu@pybeebee·26 Ağu

🎉 Delighted to announce that MetaFaith has been accepted to #EMNLP2025 Main! In this work we systematically study how well LLMs can express their internal uncertainty in words, offering a metacognition-inspired way to improve this ability 🧠✨ Check out more details below!👇

🔥 Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs🔥 How can we make LLMs talk about uncertainty in a way that truly reflects what they internally "know"? Check out our new preprint to find out! Details in 🧵(1/n):

English

2

11

2K

Gabrielle Kaili-May Liu@pybeebee·26 Tem

I will be presenting our work 𝗠𝗗𝗖𝘂𝗿𝗲 at #ACL2025NLP in Vienna this week! 🇦🇹 Come by if you’re interested in multi-doc reasoning and/or scalable creation of high-quality post-training data! 📍 Poster Session 4 @ Hall 4/5 🗓️ Wed, July 30 | 11-12:30 🔗 aclanthology.org/2025.acl-long.…

🔥Thrilled to introduce MDCure: A Scalable Pipeline for Multi-Document Instruction-Following 🔥 How can we systematically and scalably improve LLMs' ability to handle complex multi-document tasks? Check out our new preprint to find out! Details in 🧵 (1/n):

English

4

28

3K

Gabrielle Kaili-May Liu รีทวีตแล้ว

Sophia S. Han@HanSineng·25 Haz

Excited to see more investigation into LLM creativity. We have some pioneering work on this topic as well: Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models. arxiv.org/pdf/2505.10844.

Yiyou Sun@YiyouSun

🚨 New study on LLM's reasoning boundary! Can LLMs really think out of the box? We introduce OMEGA—a benchmark probing how they generalize: 🔹 RL boosts accuracy on slightly harder problems with familiar strategies, 🔹 but struggles with creative leaps & strategy composition. 👇

English

5

17

3.6K

Gabrielle Kaili-May Liu@pybeebee·2 Haz

(15/n) Big shoutout and thanks to the co-authors of this work, @_galyo, @clu_avi, Idan Szpektor, @timrudner, and @armancohan!

English

1

109

Gabrielle Kaili-May Liu@pybeebee·2 Haz

(14/n) If you're interested in LLM trustworthiness, uncertainty quantification, human-AI collaboration, or even metacognition — give our paper a read and check out MetaFaith! We'd love feedback or questions. 📄 Paper: arxiv.org/abs/2505.24858 🔗 Github: github.com/yale-nlp/MetaF…

English