Xilun Chen

732 posts

Xilun Chen

Xilun Chen

@ccsasuke

Research Scientist @ Meta FAIR

Seattle, WA Katılım Mart 2010
513 Takip Edilen564 Takipçiler
Sabitlenmiş Tweet
Xilun Chen
Xilun Chen@ccsasuke·
Introducing FLAME🔥: Factuality-Aware Alignment for LLMs We found that the standard alignment process **encourages** hallucination. We hence propose factuality-aware alignment while maintaining the LLM's general instruction-following capability. arxiv.org/abs/2405.01525
Xilun Chen tweet media
English
3
8
35
7K
Xilun Chen retweetledi
Akari Asai
Akari Asai@AkariAsai·
1/ Hiring PhD students at CMU SCS (LTI/MLD) for Fall 2026 (Deadline 12/10) 🎓 I work on open, reliable LMs: augmented LMs & agents (RAG, tool use, deep research), safety (hallucinations, copyright), and AI for science, code & multilinguality & open to bold new ideas! FAQ in 🧵
English
19
120
641
147.6K
Xilun Chen retweetledi
Gargi Ghosh
Gargi Ghosh@gargighosh·
New research from FAIR- Active Reading: a framework to learn a given set of material with self-generated learning strategies for generalized and expert domains(such as Finance). Absorb significantly more knowledge than vanilla finetuning and usual data augmentations strategies
Jessy Lin@realJessyLin

🚀 We're excited to release the 1T Active Reading-augmented Wikipedia dataset and open-source the WikiExpert model for the community: Paper: arxiv.org/abs/2508.09494 Dataset: huggingface.co/datasets/faceb… Model: huggingface.co/facebook/meta-… Thanks to my great collaborators – @vinceberges, @ccsasuke, @scottyih, @gargighosh, @barlas_berkeley ! 🧵[n/n]

English
0
11
28
5K
Xilun Chen retweetledi
Jessy Lin
Jessy Lin@realJessyLin·
🔍 How do we teach an LLM to 𝘮𝘢𝘴𝘵𝘦𝘳 a body of knowledge? In new work with @AIatMeta, we propose Active Reading 📙: a way for models to teach themselves new things by self-studying their training data. Results: * 𝟔𝟔% on SimpleQA w/ an 8B model by studying the wikipedia docs (+𝟑𝟏𝟑% vs plain finetuning) * a domain-specific expert model: 𝟏𝟔𝟎% vs FT on FinanceBench knowledge * an 8B wikipedia expert competitive w/ 405B on factuality (💥open-sourced!) 🧵[1/n]
Jessy Lin tweet media
English
15
150
1.1K
132.1K
Xilun Chen retweetledi
Xueguang Ma
Xueguang Ma@xueguang_ma·
🚀 Introducing BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent. It is a new Deep-Research evaluation benchmark built on top of BrowseComp. It features - 📚 a fixed, carefully curated corpus of web documents - ✅ human-verified positive documents - ⚔️ web-mined challenging hard negatives. With BrowseComp-Plus, you can thoroughly evaluate and compare the performance of different components in a deep-research system. e.g. GPT-5 + Qwen3-Embedding. Code, dataset, and leaderboard links are provided at the end of this thread.
Xueguang Ma tweet media
English
10
35
235
61.5K
Xilun Chen retweetledi
Rulin Shao
Rulin Shao@RulinShao·
Factuality and logical reasoning (e.g., math, code) favor different sets of reasoning patterns. 🧑‍🍳 A fresh RL recipe to improve factuality is here — crafted by the amazing @ccsasuke!
Jason Weston@jaseweston

...is today a good day for new paper posts? 🤖Learning to Reason for Factuality 🤖 📝: arxiv.org/abs/2508.05618 - New reward func for GRPO training of long CoTs for *factuality* - Design stops reward hacking by favoring precision, detail AND quality - Improves base model across all axes 🧵1/3

English
0
5
74
7.7K
Xilun Chen retweetledi
Jason Weston
Jason Weston@jaseweston·
...is today a good day for new paper posts? 🤖Learning to Reason for Factuality 🤖 📝: arxiv.org/abs/2508.05618 - New reward func for GRPO training of long CoTs for *factuality* - Design stops reward hacking by favoring precision, detail AND quality - Improves base model across all axes 🧵1/3
Jason Weston tweet media
English
1
49
382
36.7K
Xilun Chen retweetledi
Xilun Chen retweetledi
Wenting Zhao
Wenting Zhao@wzhao_nlp·
Some personal news: I'll join @UMassAmherst CS as an assistant professor in fall 2026. Until then, I'll postdoc at @Meta nyc. Reasoning will continue to be my main interest, with a focus on data-centric approaches🤩 If you're also interested, apply to me (phds & a postdoc)!
English
95
29
849
71.1K
Xilun Chen retweetledi
AK
AK@_akhaliq·
Meta just dropped ReasonIR on Hugging Face Training Retrievers for Reasoning Tasks
AK tweet media
English
5
48
308
41.4K
Xilun Chen retweetledi
AI at Meta
AI at Meta@AIatMeta·
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model with 16 experts. • Industry-leading context window of 10M tokens. • Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks. Llama 4 Maverick • 17B-active-parameter model with 128 experts. • Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image. • Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks. • Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters. • Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena. These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight. Read more about the first Llama 4 models, including training and benchmarks ➡️ go.fb.me/gmjohs Download Llama 4 ➡️ go.fb.me/bwwhe9
AI at Meta tweet media
English
827
2.4K
12.8K
3.7M
Xilun Chen retweetledi
Zhuang Liu
Zhuang Liu@liuzhuang1234·
New paper - Transformers, but without normalization layers (1/n)
Zhuang Liu tweet media
English
76
578
4.1K
1.3M
Xilun Chen retweetledi
Matthew Finlayson ✈️ ICLR
🧵 Adapting your LLM for new tasks is dangerous! A bad training set degrades models by encouraging hallucinations and other misbehavior. Our paper remedies this for RAG training by replacing gold responses with self-generated demonstrations. Check it out: arxiv.org/abs/2502.10
Matthew Finlayson ✈️ ICLR tweet mediaMatthew Finlayson ✈️ ICLR tweet mediaMatthew Finlayson ✈️ ICLR tweet media
English
1
4
7
458
Xilun Chen retweetledi
Srini Iyer
Srini Iyer@sriniiyer88·
New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here: dl.fbaipublicfiles.com/blt/BLT__Patch… (1/n)
English
2
22
90
18.5K
Xilun Chen retweetledi
Jack Lin
Jack Lin@jacklin_64·
I will present our paper FLAME on factuality alignment for LLMs with @luyu_gao at #NeurIPS2024! 🎉 Join us at East Exhibit Hall A-C, Booth #3501 for a chat on Wed (Dec 11, 4:30--7:30 pm). Looking forward to connecting! More detail: neurips.cc/virtual/2024/p…
Xilun Chen@ccsasuke

Introducing FLAME🔥: Factuality-Aware Alignment for LLMs We found that the standard alignment process **encourages** hallucination. We hence propose factuality-aware alignment while maintaining the LLM's general instruction-following capability. arxiv.org/abs/2405.01525

English
0
5
14
2.7K
Xilun Chen retweetledi
Akari Asai
Akari Asai@AkariAsai·
🚨 I’m on the job market this year! 🚨 I’m completing my @uwcse Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵
Akari Asai tweet media
English
27
117
813
126.7K
Xilun Chen retweetledi
Minghan
Minghan@alexlimh23·
1/ Excited to share that our paper "NEST🪺: Nearest Neighbor Speculative Decoding for LLM Generation and Attribution" is accepted at #NeurIPS2024! 🚀 Catch us at the poster session on Thu, Dec 12, 4:30–7:30 PM PST, East Exhibit Hall A-C, #2201. [Details: neurips.cc/virtual/2024/p…]
Minghan@alexlimh23

Curious about enhancing factuality and attribution in LLM generation? Check out our paper: arxiv.org/abs/2405.19325 Introducing NEST🪺: Nearest Neighbor Speculative Decoding for LLM Generation and Attribution, a training-free method that adds real-world texts into LLM generation.

English
2
6
24
11.8K
Xilun Chen retweetledi
Jason Wei
Jason Wei@_jasonwei·
Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA: 1. Very simple setup: there are 4k diverse fact-seeking questions written by humans where there can only be a single, indisputable answer. Model completions are graded by an autograder as either correct, incorrect, or not attempted. 2. We created it so that it would be challenging for the current class of frontier models; both o1-preview and Claude Sonnet 3.5 are below 50% accuracy. 3. Reference answers have high correctness. Questions are written to be non-ambiguous and reference answers were verified by two independent annotators. Questions are also written to be timeless, so SimpleQA can be a useful benchmark even 5 or 10 years from now. The way that I think about evals is that they are an incentive for the AI community. New benchmarks in AI get saturated very quickly, and what they incentivize gets encoded into the next generation of language models. With a good hallucinations eval, hopefully the next wave of language models will be more trustworthy and reliable!
Jason Wei tweet media
English
28
120
860
106.5K
Xilun Chen retweetledi
Lili Yu
Lili Yu@liliyu_lili·
🚀 Excited to share our latest work: Transfusion! A new multi-modal generative training combining language modeling and image diffusion in a single transformer! Huge shout to @violet_zct @omerlevy_ @michiyasunaga @arunbabu1234 @kushal_tirumala and other collaborators.
Chunting Zhou@violet_zct

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This allows us to leverage the strengths of both approaches in one model. 1/5

English
6
19
124
58.7K