Fabian David Schmidt

99 posts

Fabian David Schmidt

Fabian David Schmidt

@fdschmidt

Member of Technical Staff @ Cohere | PhD from Uni of Würzburg on multilinguality & multimodality | prev. Mila & LTL@UniCambridge

Beigetreten Aralık 2022
108 Folgt176 Follower
Angehefteter Tweet
Fabian David Schmidt
Fabian David Schmidt@fdschmidt·
Introducing NLLB-LLM2Vec! 🚀 We fuse the NLLB encoder & Llama 3 8B trained w/ LLM2Vec to create NLLB-LLM2Vec which supports cross-lingual NLU in 200+ languages🔥 Joint work w/ Philipp Borchert, @licwu, and @gg42554 during my great research stay at @cambridgeltl
Fabian David Schmidt tweet media
English
3
18
100
12.8K
Fabian David Schmidt retweetet
Siva Reddy
Siva Reddy@sivareddyg·
LLM2Vec-Gen represents a major paradigm shift for embeddings/retrieval. Why encode the query when the LLM already knows what to look for and can directly produce an embedding for it? Best part: it’s self-supervised, and it does all of this while the LLM remains completely frozen. Think about it: "solve x² + 3x − 4 = 0" has zero reasoning in it. But the LLM's response does. By encoding the response, the embedding captures the reasoning --- and the better the LLM reasons, the better the embedding. This is why our results scale with model size. As LLMs get smarter, our embeddings automatically get better. LLM2Vec-Gen is also the first demonstration of the promise of @ylecun's JEPA for text embeddings. The alignment loss is JEPA — predict in representation space, not token space. The reconstruction loss goes beyond --- it keeps embeddings decodable. This paradigm shift opens new frontiers: 🔬 Can we build a full JEPA for language where the teacher and student are the same LLM? ⚡ Can LLMs reason in compressed space without ever generating text? 🤖 Can agents reason in compression tokens and carry that directly into retrieval? 💬 Can agents talk to each other in compression tokens instead of text --- dense, fast, and still human-readable? LLM2Vec-Gen is a first step toward all four.
Siva Reddy tweet media
Vaibhav Adlakha@vaibhav_adlakha

Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning

English
7
27
171
21.5K
Fabian David Schmidt retweetet
Fabian David Schmidt retweetet
Vaibhav Adlakha
Vaibhav Adlakha@vaibhav_adlakha·
Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning
GIF
English
4
36
190
48.9K
Fabian David Schmidt retweetet
Marius Mosbach
Marius Mosbach@mariusmosbach·
Check out our new preprint on the superficial alignment hypothesis (SAH). 👇 We operationalize the SAH via the length of the shortest program that achieves a certain performance on a task, unifying previous views on the SAH and showing how post-training affects "superficiality".
tom@tvergarabrowne

first paper of the phd 🥳 the Superficial Alignment Hypothesis (SAH) argues that pre-training adds most of the knowledge to a model, and post-training merely surfaces it. however, this hypothesis has lacked a precise definition. we fix this.

English
2
2
8
720
Fabian David Schmidt retweetet
Cohere Labs
Cohere Labs@Cohere_Labs·
Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are. Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.
English
30
158
859
185.4K
Fabian David Schmidt retweetet
Desmond Elliott
Desmond Elliott@delliott·
I am grateful that the Carlsberg Foundation is supporting our basic research on tokenization-free language models at the University of Copenhagen. I will be hiring Ph.D students to start in September 2026. Feel free to reach out early if you want to express informal interest.
Carlsbergfondet@Carlsbergfondet

Fra politologi til arkæologi. Fra astrofysik til marinbiologi og glaciologi. 159 forskere modtager i dag en bevilling fra Carlsbergfondet til vidt forskellige grundvidenskabelige initiativer. Se hvilke projekter, der har fået støtte 👉bit.ly/4iK2fV2 #dkforsk

English
1
7
25
2.1K
Fabian David Schmidt retweetet
Cohere
Cohere@cohere·
Introducing our latest breakthrough in AI search and retrieval: Rerank 4! It’s the most advanced set of reranking models on the market, with best-in-class performance across search relevance, speed, deployment flexibility, multilingual support, and domain-specific understanding.
Cohere tweet media
English
11
51
170
41.1K
Fabian David Schmidt retweetet
Josip Jukic
Josip Jukic@chatruncata·
Presenting our paper "Disentangling Latent Shifts of In-Context Learning with Weak Supervision" (with Jan Šnajder) at NeurIPS 2025, San Diego: 🗓 Fri, Dec 5 · 11:00–14:00 PST 📍 Exhibit Hall C/D/E · Poster #2615 Paper: openreview.net/pdf?id=tAq9Gxd… #NeurIPS2025
English
0
1
7
684
Fabian David Schmidt retweetet
Verna Dankers
Verna Dankers@vernadankers·
Ready for day 3 of #EMNLP2025 🎉🎉 I've been on the lookout for memorization, unlearning, interp, memory module papers & more, chat w me if these topics fascinate you too😻 Looking forward to more of Suzhou, the conf & my BlackboxNLP keynote Sunday 1.45PM! blackboxnlp.github.io/2025/
Verna Dankers tweet mediaVerna Dankers tweet mediaVerna Dankers tweet mediaVerna Dankers tweet media
English
0
10
57
5.1K
Fabian David Schmidt retweetet
Mehar Bhatia
Mehar Bhatia@bhatia_mehar·
🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵
English
2
46
127
39.6K
Fabian David Schmidt retweetet
Tiancheng Hu
Tiancheng Hu@tiancheng_hu·
Instruction tuning unlocks incredible skills in LLMs, but at a cost: they become dangerously overconfident. You face a choice: a well-calibrated base model or a capable but unreliable instruct model. What if you didn't have to choose? What if you could navigate the trade-off? (1/8)
GIF
English
3
4
14
1.1K
Fabian David Schmidt retweetet
Catherine Arnett
Catherine Arnett@linguist_cat·
I’m so excited that Global PIQA is out! This has been a herculean effort by our 300+ contributors. The result is an extremely high-quality, culturally-specific benchmark for over 100 languages.
Multilingual Representation Workshop @ EMNLP 2025@mrl_workshop

Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.

English
1
7
35
4.5K
Fabian David Schmidt retweetet
Multilingual Representation Workshop @ EMNLP 2025
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
Multilingual Representation Workshop @ EMNLP 2025 tweet media
English
2
57
114
26.5K
Fabian David Schmidt retweetet
Valentina Pyatkin
Valentina Pyatkin@valentina__py·
I will be giving a talk at @ETH_AI_Center next week, on RLVR for verifiable instruction following, generalization, and reasoning! 📢 Join if you are in Zurich and interested in hearing about IFBench and our latest Olmo and Tülu works at @allen_ai
Valentina Pyatkin tweet media
English
3
9
109
10.2K
Fabian David Schmidt retweetet
Marius Mosbach
Marius Mosbach@mariusmosbach·
Come talk to @Ara_Krishnan and me about our recent paper on frequency effects of unlearning and how @allen_ai 's Olmo model and toolkit made this work so much easier. 🚀
Ai2@allen_ai

Olmo isn’t just open weights—it’s an open research stack. Try it in the Ai2 Playground: playground.allenai.org AMA on Discord: Tues, Oct 28 @ 8:00 AM PT with some of the researchers behind these studies + an Ai2 Olmo teammate. Join: discord.gg/ai2

English
1
5
19
1.5K
Fabian David Schmidt retweetet
Marius Mosbach
Marius Mosbach@mariusmosbach·
If you are thinking a lot about CoT and multi-agent communication these days, check out Michael's work below 👇. And make sure to keep an eye on his work going forward, more great things to come! 👨🏻‍🍳
Michael Rizvi-Martel@frisbeemortel

Is there such a thing as too many agents in multi-agent systems? It depends! 🧵 Our work reveals 3 distinct regimes where communication patterns differ dramatically. More on our findings below 👇 (1/7)

English
0
3
10
1.5K
Fabian David Schmidt retweetet
Michael Rizvi-Martel
Michael Rizvi-Martel@frisbeemortel·
Is there such a thing as too many agents in multi-agent systems? It depends! 🧵 Our work reveals 3 distinct regimes where communication patterns differ dramatically. More on our findings below 👇 (1/7)
Michael Rizvi-Martel tweet media
English
1
11
35
13.5K