Nicholas Meade

110 posts

Nicholas Meade

Nicholas Meade

@ncmeade

PhD Student at @McGillU / @Mila_Quebec; AI Safety

Katılım Eylül 2020
203 Takip Edilen200 Takipçiler
Nicholas Meade retweetledi
Shruti Joshi
Shruti Joshi@_shruti_joshi_·
SAEs fail at OOD tasks. Why? Features in superposition are linearly representable but not linearly accessible. Instead of discarding sparse coding, we embrace the geometry of superposition and use methods equipped to handle the nonlinearity it induces.
English
6
29
204
36.9K
Nicholas Meade retweetledi
Shruti Joshi
Shruti Joshi@_shruti_joshi_·
Mechanistic interpretability aims to understand models — and the more superhuman or incoherent they become, the more we need that understanding to be reliable. We propose a framework for this, drawing on established tools from causal reasoning and statistical identifiability: 🧵
English
3
16
111
35.1K
Nicholas Meade retweetledi
Nicholas Meade retweetledi
Vaibhav Adlakha
Vaibhav Adlakha@vaibhav_adlakha·
Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning
GIF
English
4
36
192
49.5K
Nicholas Meade retweetledi
Siva Reddy
Siva Reddy@sivareddyg·
LLM2Vec-Gen represents a major paradigm shift for embeddings/retrieval. Why encode the query when the LLM already knows what to look for and can directly produce an embedding for it? Best part: it’s self-supervised, and it does all of this while the LLM remains completely frozen. Think about it: "solve x² + 3x − 4 = 0" has zero reasoning in it. But the LLM's response does. By encoding the response, the embedding captures the reasoning --- and the better the LLM reasons, the better the embedding. This is why our results scale with model size. As LLMs get smarter, our embeddings automatically get better. LLM2Vec-Gen is also the first demonstration of the promise of @ylecun's JEPA for text embeddings. The alignment loss is JEPA — predict in representation space, not token space. The reconstruction loss goes beyond --- it keeps embeddings decodable. This paradigm shift opens new frontiers: 🔬 Can we build a full JEPA for language where the teacher and student are the same LLM? ⚡ Can LLMs reason in compressed space without ever generating text? 🤖 Can agents reason in compression tokens and carry that directly into retrieval? 💬 Can agents talk to each other in compression tokens instead of text --- dense, fast, and still human-readable? LLM2Vec-Gen is a first step toward all four.
Siva Reddy tweet media
Vaibhav Adlakha@vaibhav_adlakha

Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning

English
7
27
171
21.7K
Nicholas Meade retweetledi
Michael Rizvi-Martel
Michael Rizvi-Martel@frisbeemortel·
I wrote my first ever blog post! "Agentic Coding: A New Abstraction Layer in the Programming Stack" I give some thoughts about how my coding changed with agents, where I think it's headed, and how the resistance to adopting them echoes past shifts in CS. Link below👇
Michael Rizvi-Martel tweet media
English
1
8
14
5.4K
Nicholas Meade retweetledi
Nicholas Meade retweetledi
tom
tom@tvergarabrowne·
first paper of the phd 🥳 the Superficial Alignment Hypothesis (SAH) argues that pre-training adds most of the knowledge to a model, and post-training merely surfaces it. however, this hypothesis has lacked a precise definition. we fix this.
tom tweet media
English
9
45
235
31.3K
Nicholas Meade retweetledi
Benno Krojer
Benno Krojer@benno_krojer·
🚨New paper Are visual tokens going into an LLM interpretable 🤔 Existing methods (e.g. logit lens) and assumptions would lead you to think “not much”... We propose LatentLens and show that most visual tokens are interpretable across *all* layers 💡 Details 🧵
Benno Krojer tweet media
English
3
58
241
51.8K
Nicholas Meade retweetledi
Mehar Bhatia
Mehar Bhatia@bhatia_mehar·
🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵
English
2
49
127
39.7K
Nicholas Meade retweetledi
Amirhossein Kazemnejad
Amirhossein Kazemnejad@a_kazemnejad·
It’s clear next-gen reasoning LLMs will run for millions of tokens. RL at 1M needs ~100× compute than 128K. Our Markovian Thinking keeps compute scaling linear instead. Check out Milad’s thread; some of my perspectives below:
Milad Aghajohari@MAghajohari

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

English
21
91
881
114.4K
Nicholas Meade retweetledi
Milad Aghajohari
Milad Aghajohari@MAghajohari·
Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵
GIF
English
14
198
911
205.9K
Nicholas Meade retweetledi
Nicholas Meade retweetledi
Alexander Panfilov
Alexander Panfilov@kotekjedi_ml·
🚨 New paper! LLMs, when asked harmful questions, sometimes produce outputs that look helpful (and harmful) — but are actually 𝗱𝗲𝗹𝗶𝗯𝗲𝗿𝗮𝘁𝗲𝗹𝘆 𝘄𝗿𝗼𝗻𝗴 What’s bad - current LLM-based jailbreak scorers can’t tell the difference (me neither) More in 🧵👇
Alexander Panfilov tweet media
English
4
16
97
16.1K
Nicholas Meade
Nicholas Meade@ncmeade·
If you're interested in working on agent safety (and are a student in Canada) you should apply to this! @gspandana is extremely smart and one of the kindest people I've gotten to work with
Spandana Gella@gspandana

Internship @ServiceNowRSRCH to build the next generation of computer use agents that are safe and secure from malicious attacks. Focus on intervention strategies, defenses to make agents robust against unsafe behavior.. Apply here: bit.ly/3V3mmTg

English
0
0
6
253
Nicholas Meade retweetledi
Maksym Andriushchenko
Maksym Andriushchenko@maksym_andr·
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start in Fall 2025 (i.e., as soon as possible) and through centralized programs like CLS, IMPRS, and ELLIS (the deadlines are in November) to start in Spring–Fall 2026. I'm also searching for postdocs, master's thesis students, and research interns. Fill the Google form below if you're interested! Research group. We will focus on developing algorithmic solutions to reduce harms from advanced general-purpose AI models. We're particularly interested in alignment of autonomous LLM agents, which are becoming increasingly capable and pose a variety of emerging risks. We're also interested in rigorous AI evaluations and informing the public about the risks and capabilities of frontier AI models. Additionally, we aim to advance our understanding of how AI models generalize, which is crucial for ensuring their steerability and reducing associated risks. For more information about research topics relevant to our group, please check the following documents: - International AI Safety Report, - An Approach to Technical AGI Safety and Security by DeepMind, - Open Philanthropy’s 2025 RFP for Technical AI Safety Research. Research style. We are not necessarily interested in getting X papers accepted at NeurIPS/ICML/ICLR. We are interested in making an impact: this can be papers (and NeurIPS/ICML/ICLR are great venues), but also open-source repositories, benchmarks, blog posts, even social media posts—literally anything that can be genuinely useful for other researchers and the general public. Broader vision. Current machine learning methods are fundamentally different from what they used to be pre-2022. The Bitter Lesson summarized and predicted this shift very well back in 2019: "general methods that leverage computation are ultimately the most effective". Taking this into account, we are only interested in studying methods that are general and scale with intelligence and compute. Everything that helps to advance their safety and alignment with societal values is relevant to us. We believe getting this—some may call it "AGI"—right is one of the most important challenges of our time. Join us on this journey!
Maksym Andriushchenko tweet media
English
76
90
843
105.6K
Nicholas Meade retweetledi
Siva Reddy
Siva Reddy@sivareddyg·
What's the path to scalable and safe web agents? Is web agents the new semantic parsing? I will be giving a talk at the ACL REALM workshop today at 9:30 am. Come check out if you are interested in the history and contemporary work in this area. Lot of other exciting speakers. #ACL2025 #ACL2025NLP x.com/nouhadziri/sta…
Siva Reddy tweet media
English
1
10
39
2.5K
Nicholas Meade
Nicholas Meade@ncmeade·
Come by our #ACL2025 poster tomorrow to discuss the safety risks surrounding increasingly capable instruction-following retrievers (or anything safety related)! 16:00-17:30 on Tuesday in Hall 4/5
Parishad BehnamGhader@ParishadBehnam

Come and visit our poster on the Safety of Retrievers @aclmeeting 🗓️Tuesday, Findings Posters, 16:00-17:30 🚨Instruction-following retrievers will become increasingly good tools for searching for harmful or sensitive information.🚨

English
0
4
15
977
Nicholas Meade retweetledi
Shruti Joshi
Shruti Joshi@_shruti_joshi_·
I will be at the Actionable Interpretability Workshop (@ActInterp, #ICML) presenting *SSAEs* in the East Ballroom A from 1-2pm. Drop by (or send a DM) to chat about (actionable) interpretability, (actionable) identifiability, and everything in between!
Shruti Joshi@_shruti_joshi_

1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time! Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. 🧶

English
1
6
25
2.7K
Nicholas Meade
Nicholas Meade@ncmeade·
I'll be at #ICML2025 this week presenting SafeArena (Wednesday 11AM - 1:30PM in East Exhibition Hall E-701). Come by to chat with me about web agent safety (or anything else safety-related)!
Nicholas Meade tweet media
English
1
12
35
5.6K