LLM Evals Workshop @NeurIPS

39 posts

LLM Evals Workshop @NeurIPS

@LLM_eval

NeurIPS 2025 Workshop. Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

San Diego 가입일 Temmuz 2025

19 팔로잉216 팔로워

고정된 트윗

LLM Evals Workshop @NeurIPS@LLM_eval·22 Tem

We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3

English

29K

LLM Evals Workshop @NeurIPS 리트윗함

Berivan Isik@BerivanISIK·8 Ara

It has been a super fun day @LLM_eval workshop @NeurIPSConf with amazing talks, posters, and an engaging panel discussion! @dawnsongtweets @natolambert @orf_bnw @sanmikoyejo @abeirami @hamishivi @MariusHobbhahn @beyzaermis @Diyi_Yang @attaluri_nithya @RishiBommasani @YangjunR

English

136

16.8K

LLM Evals Workshop @NeurIPS 리트윗함

Berivan Isik@BerivanISIK·8 Ara

Our next talk @LLM_eval workshop is by @sanmikoyejo! Upper Level Room 2 @NeurIPSConf

English

2.8K

LLM Evals Workshop @NeurIPS 리트윗함

Berivan Isik@BerivanISIK·7 Ara

“Good researchers obsess over evals” by @natolambert @LLM_eval workshop!

English

4.6K

LLM Evals Workshop @NeurIPS 리트윗함

Nithya Attaluri@attaluri_nithya·7 Ara

Bringing the hot take culture to NeurIPS - great talk @orf_bnw!!

Berivan Isik@BerivanISIK

@LLM_eval workshop has started with Orhan Firat’s talk at Upper Level Room 2. @NeurIPSConf

English

1.9K

LLM Evals Workshop @NeurIPS 리트윗함

Berivan Isik@BerivanISIK·7 Ara

@dawnsongtweets is giving a talk on agentic evals @LLM_eval workshop!

English

1.4K

LLM Evals Workshop @NeurIPS 리트윗함

Berivan Isik@BerivanISIK·7 Ara

@LLM_eval workshop has started with Orhan Firat’s talk at Upper Level Room 2. @NeurIPSConf

English

4.5K

LLM Evals Workshop @NeurIPS 리트윗함

Nathan Lambert@natolambert·6 Ara

Good researchers obsess over evals The story of Olmo 3 (post-training), told through evals NeurIPS Talk tomorrow. Upper Level Room 2, 10:35AM.

English

598

56.6K

LLM Evals Workshop @NeurIPS 리트윗함

Berivan Isik@BerivanISIK·2 Ara

I’ll be @NeurIPSConf all week and would love to connect on LLM data, evaluation, benchmarking, and scaling laws. If you’re working on related problems, feel free to reach out. PS: Don’t miss our one-of-a-kind workshop on LLM evaluation: sites.google.com/view/llm-eval-…

English

9.2K

LLM Evals Workshop @NeurIPS@LLM_eval·27 Kas

See you in San Diego on December 7th!

English

144

LLM Evals Workshop @NeurIPS@LLM_eval·27 Kas

- "The Measure of All Measures: Quantifying LLM Benchmark Quality" -- Jihan Yao, Peter Jin, Ke Bao, Qiaolin Yu et al. openreview.net/forum?id=HpnGm…

English

749

LLM Evals Workshop @NeurIPS@LLM_eval·27 Kas

🚀 We are thrilled to announce that the LLM Eval Workshop @NeurIPSConf received 244 excellent submissions! 188 papers will be presented in poster sessions, and 5 exceptional works have been selected for oral talks. Check out the accepted papers: sites.google.com/view/llm-eval-… 🧵👇

English

619

LLM Evals Workshop @NeurIPS 리트윗함

Huanxin Sheng@HuanxinShe5254·26 Kas

I will present my #EMNLP2025 paper at the #NeurIPS2025 LLM Eval Workshop @LLM_eval (Dec. 7th 11:15 - 12:15Poster Session 2). If you are interested in reliable LLM-as-a-judge, please come say hi! ☕️ #AI #LLM #LLMJudge #LLMEvaluation #ConformalPrediction

Huanxin Sheng@HuanxinShe5254

🤩My FIRST paper received #EMNLP2025 SAC Highlights: "Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction" Huge thanks to my advisor @jiank_uiuc and collaborators Xinyi Liu, @hangfeng_he , & @jieyuzhao11 ! #AI #NLP #LLM #ConformalPrediction

English

997

LLM Evals Workshop @NeurIPS 리트윗함

Juan Miguel Navarro@JuanMiguelNC·13 Eki

(5/5) Read the paper at: arxiv.org/abs/2510.08616 Looking forward to discussing more at NeurIPS in San Diego!

English

1.5K

LLM Evals Workshop @NeurIPS 리트윗함

Juan Miguel Navarro@JuanMiguelNC·13 Eki

(1/5) My work, “LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests”, has been accepted for a contributed talk at @NeurIPSConf 2025 Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling workshop @LLM_eval #NeurIPS #LLM #Evaluation #Robustness #AI #ML

English

8.4K

LLM Evals Workshop @NeurIPS 리트윗함

Riccardo Cadei@riccardocadeii·7 Eki

Sketched on a few Parisian summer nights with a friend, @ChrisInterno . If you care about (causal) identification in a semi-synthetic future, we’d value your read and critique. Preprint: arxiv.org/pdf/2509.17999 Accepted at @LLM_eval workshop @NeurIPSConf

English

256

LLM Evals Workshop @NeurIPS 리트윗함

Riccardo Cadei@riccardocadeii·7 Eki

The Narcissus Hypothesis: --Recursive training on semi-synthetic corpora enforcing human alignment induces a Social Desirability Bias: world-models (Narcissus) aim to please rather than represent, polluting data lakes and charming us (Echo) into hanging on their every word.

English

1.1K

탐색

@NeurIPSConf @dawnsongtweets @natolambert @orf_bnw @sanmikoyejo @abeirami @hamishivi @MariusHobbhahn