Kate Sanders

283 posts

Kate Sanders

Kate Sanders

@kesnet50

LLM post-training, reasoning, and multimodality. Ph.D. @jhuclsp, incoming researcher at Microsoft Copilot Tuning.

Cambridge, MA Katılım Ağustos 2021
362 Takip Edilen322 Takipçiler
Kate Sanders retweetledi
Eugene Yang
Eugene Yang@EYangTW·
🚨 Calling all RAG researchers & NLP folks: RAG4Reports is coming to ACL 2026 this July — a workshop + shared tasks dedicated to the hardest version of RAG: long-form, citation-backed, multilingual report generation. Here's why you should care 🧵👇 🔗 rag4reports.github.io
English
1
8
18
4K
Kate Sanders retweetledi
MAGMaR
MAGMaR@MAGMaR_workshop·
This year's shared task allows you to submit for the retrieval track, generation track, or full RAG track on a challenging new collection of unedited ("raw") videos. Research Papers (Apr. 1) Shared Task (Apr. 20)
English
0
1
1
101
Kate Sanders retweetledi
MAGMaR
MAGMaR@MAGMaR_workshop·
📹 + 🧠 + 📝 = 🔥 First call for MAGMaR 2026, the 2nd workshop on multimodal augmented generation via multimodal retrieval! If #RAG isn't hard enough for you, try multilingually and multimodally. Collocated with @aclmeeting in San Diego in July. nlp.jhu.edu/magmar/
English
1
2
8
3.1K
Kate Sanders
Kate Sanders@kesnet50·
I will be at AAAI 2026 in Singapore next week! ✈️ I'm looking forward to seeing everyone's cool projects and discussing reasoning, post-training, and multimodality. Please reach out if you will be there and would like to connect.
English
1
2
4
791
Kate Sanders retweetledi
Eugene Yang
Eugene Yang@EYangTW·
🌍 Excited to announce: WSDM Cup 2026 Multilingual Retrieval is LIVE! Ever wondered how to build search systems that work across languages? We're challenging you to query in English and retrieve from Chinese, Persian, and Russian documents simultaneously. Ready to join? 🧵👇
English
1
8
12
6.2K
Kate Sanders retweetledi
Benjamin Van Durme
Benjamin Van Durme@ben_vandurme·
Compactor: Calibrated LLM KV cache Compression. 50% cache size with ~ZERO performance loss! compactor-vllm: inference engine for KV compression. Similar speed to vllm-v1 and 15x faster than NVIDIA KVPress, unlocks practical KV compression.
Benjamin Van Durme tweet media
English
2
5
21
1.3K
Kate Sanders retweetledi
Liaoyaqi Wang
Liaoyaqi Wang@LiaoyaqiW·
🚀 Thrilled to share our new work: "Always Tell Me The Odds" in COLM25 LLMs struggle with accurate probability predictions, often giving coarse answers. We train decoder-based models to provide fine-grained, calibrated probabilities, significantly outperforming strong baselines!
English
1
4
14
4.9K
Kate Sanders retweetledi
William Fleshman
William Fleshman@willcfleshman·
Did you know that LoRA A matrices can be frozen at init w/o degrading performance? 🤯 We leverage this trick to construct an unsupervised routing procedure that achieves identical performance to the previous best with orders of magnitude fewer FLOPs and ~50% less GPU memory. 🧵
William Fleshman tweet media
English
1
6
11
2.4K
Kate Sanders retweetledi
Marc Marone
Marc Marone@ruyimarone·
3T tokens, ~1800 languages, 2 models - we’re releasing mmBERT, a modern multilingual encoder model!
Marc Marone tweet media
English
11
67
400
31K
Kate Sanders retweetledi
Aleksa Gordić (水平问题)
Aleksa Gordić (水平问题)@gordic_aleksa·
New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work! Took me a while to get this level of understanding of the codebase and then to write up this one - i quickly realized i understimated the effort. 😅 It could have easily been a book/booklet (lol). I covered: * Basics of inference engine flow (input/output request processing, scheduling, paged attention, continuous batching) * "Advanced" stuff: chunked prefill, prefix caching, guided decoding (grammar-constrained FSM), speculative decoding, disaggregated P/D * Scaling up: going from smaller LMs that can be hosted on a single GPU all the way to trillion+ params (via TP/PP/SP) -> multi-GPU, multi-node setup * Serving the model on the web: going from offline deployment to multiple API servers, load balancing, DP coordinator, multiple engines setup :) * Measuring perf of inference systems (latency (ttft, itl, e2e, tpot), throughput) and GPU perf roofline model Lots of examples, lots of visuals! --- I realize i've been silent on social - many of you noticed and thanks for reaching out! :) --> I'm so back! lots of things happened. Also, in general, I'm a bit sick of superficial content, it really is an equivalent of junk food (h/t @karpathy). I want to do the best/deepest technical work of my life over the next years and write much more in depth (high quality organic food ;)) so I might not be as frequent around here as i used to be (? we'll see). I'll make it a goal to share a few paper summaries a week or stuff that's relevant / in the zeitgeist. If you have any topics that happened over the past few weeks/months drop it down in the comments i might focus on some of those in my next posts. --- Huge thank you to @Hyperstackcloud for giving me an H100 node to run some of the experiments and analysis that i needed to write this up. The team there led by Christopher Starkey is amazing! Also a big thank you to Nick Hill (who did a very thorough review of the post - basically a code review lol; Nick's a core vLLM contributor and principal SWE at RedHat) and to my friends Kyle Krannen (NVIDIA Dynamo), @marksaroufim (PyTorch), and @ashVaswani (goat) for taking the time during weekend when they didn't have to!
Aleksa Gordić (水平问题) tweet media
English
63
401
2.6K
323.5K
Kate Sanders retweetledi
Isabel Cachola
Isabel Cachola@isabelcachola·
Our work on readability evaluation for Plain Language Summarization will appear at #EMNLP2025!! @DanielKhashabi @mdredze Paper: arxiv.org/pdf/2508.19221 TLDR: Traditional readability metrics correlate poorly with human judgements & LMs consider deeper readability features. 1/6
Isabel Cachola tweet media
English
1
17
46
4.3K
Kate Sanders retweetledi
Kate Sanders
Kate Sanders@kesnet50·
Taking off for Vienna #ACL2025! 🇦🇹 Excited to talk with people about transparent reasoning, multimodality, and fact verification. Stop by our multimodal RAG workshop on Friday 🔥🔥🔥 x.com/MAGMaR_worksho… Please reach out if you want to grab coffee!
MAGMaR@MAGMaR_workshop

New Workshop on Multimodal Augmented Generation via MultimodAl Retrieval (MAGMaR) to be held at @aclmeeting ACL in Vienna this summer. We have a new shared task that stumps most LLMs - including ones pretrained on our test collection. nlp.jhu.edu/magmar/

English
5
2
6
937
Kate Sanders retweetledi
Eugene Yang
Eugene Yang@EYangTW·
🚨Wouldn’t it be nice if your agentic search system could reason over all your docs? ✨Introducing Rank-K, a listwise reranker that benefits from test-time compute and long-context! Rank-K sets a new SoTA for reasoning-based reranking, without reasoning chains from other models.
Eugene Yang tweet media
English
2
28
192
21.5K