Nicholas Meade (@ncmeade) - Twitter Profili | Zamantika Mersobahis Locabet

Nicholas Meade retweetledi

Shruti Joshi@_shruti_joshi_·5d

SAEs fail at OOD tasks. Why? Features in superposition are linearly representable but not linearly accessible. Instead of discarding sparse coding, we embrace the geometry of superposition and use methods equipped to handle the nonlinearity it induces.

English

6

29

204

36.9K

Nicholas Meade retweetledi

Shruti Joshi@_shruti_joshi_·20 Mar

Mechanistic interpretability aims to understand models — and the more superhuman or incoherent they become, the more we need that understanding to be reliable. We propose a framework for this, drawing on established tools from causal reasoning and statistical identifiability: 🧵

English

3

16

111

35.1K

Nicholas Meade retweetledi

Marius Mosbach@mariusmosbach·12 Mar

Checkout our latest work on building self-supervised text embeddings without relying on contrastive data. ☝️ The main idea behind LLM2Vec-Gen is trying to encode a model's answer to a query, rather than the query itself.

Vaibhav Adlakha@vaibhav_adlakha

Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning

English

3

5

27

1.6K

Nicholas Meade retweetledi

Vaibhav Adlakha@vaibhav_adlakha·12 Mar

Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning

GIF

English

4

36

192

49.5K

Nicholas Meade retweetledi

Siva Reddy@sivareddyg·12 Mar

LLM2Vec-Gen represents a major paradigm shift for embeddings/retrieval. Why encode the query when the LLM already knows what to look for and can directly produce an embedding for it? Best part: it’s self-supervised, and it does all of this while the LLM remains completely frozen. Think about it: "solve x² + 3x − 4 = 0" has zero reasoning in it. But the LLM's response does. By encoding the response, the embedding captures the reasoning --- and the better the LLM reasons, the better the embedding. This is why our results scale with model size. As LLMs get smarter, our embeddings automatically get better. LLM2Vec-Gen is also the first demonstration of the promise of @ylecun's JEPA for text embeddings. The alignment loss is JEPA — predict in representation space, not token space. The reconstruction loss goes beyond --- it keeps embeddings decodable. This paradigm shift opens new frontiers: 🔬 Can we build a full JEPA for language where the teacher and student are the same LLM? ⚡ Can LLMs reason in compressed space without ever generating text? 🤖 Can agents reason in compression tokens and carry that directly into retrieval? 💬 Can agents talk to each other in compression tokens instead of text --- dense, fast, and still human-readable? LLM2Vec-Gen is a first step toward all four.

Vaibhav Adlakha@vaibhav_adlakha

Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning

English

7

27

171

21.7K

Nicholas Meade retweetledi

Michael Rizvi-Martel@frisbeemortel·25 Şub

I wrote my first ever blog post! "Agentic Coding: A New Abstraction Layer in the Programming Stack" I give some thoughts about how my coding changed with agents, where I think it's headed, and how the resistance to adopting them echoes past shifts in CS. Link below👇

English

1

8

14

5.4K

Nicholas Meade retweetledi

Siva Reddy@sivareddyg·18 Şub

Honored to be a Sloan Fellow. So grateful to my wonderful students, mentors, colleagues, friends and family, thank you! ❤️

Sloan Foundation@SloanFoundation

Congrats to the 126 early-career scholars awarded a 2026 Sloan Research Fellowship, whose creativity and innovation set them apart as the next generation of scientific leaders! Our Fellows represent 7 fields and 44 institutions across the US and Canada. sloan.org/fellowships/20…

English

32

14

112

13.4K

Nicholas Meade retweetledi

tom@tvergarabrowne·18 Şub

first paper of the phd 🥳 the Superficial Alignment Hypothesis (SAH) argues that pre-training adds most of the knowledge to a model, and post-training merely surfaces it. however, this hypothesis has lacked a precise definition. we fix this.

English

9

45

235

31.3K

Nicholas Meade retweetledi

Benno Krojer@benno_krojer·11 Şub

🚨New paper Are visual tokens going into an LLM interpretable 🤔 Existing methods (e.g. logit lens) and assumptions would lead you to think “not much”... We propose LatentLens and show that most visual tokens are interpretable across *all* layers 💡 Details 🧵

English

3

58

241

51.8K

Nicholas Meade retweetledi

Mehar Bhatia@bhatia_mehar·4 Kas

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

English

2

49

127

39.7K

Nicholas Meade retweetledi

Amirhossein Kazemnejad@a_kazemnejad·9 Eki

It’s clear next-gen reasoning LLMs will run for millions of tokens. RL at 1M needs ~100× compute than 128K. Our Markovian Thinking keeps compute scaling linear instead. Check out Milad’s thread; some of my perspectives below:

Milad Aghajohari@MAghajohari

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

English

21

91

881

114.4K

Nicholas Meade retweetledi

Milad Aghajohari@MAghajohari·9 Eki

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

GIF

English

14

198

911

205.9K

Nicholas Meade retweetledi

Xing Han Lu@xhluca·3 Eki

i will be presenting AgentRewardBench at #COLM2025 next week! session: #3 date: wednesday 11am to 1pm poster: #545 come learn more about the paper, my recent works or just chat about anything (montreal, mila, etc.) here's a teaser of my poster :)

Xing Han Lu@xhluca

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and no single LLM judge excels across all benchmarks. We collect trajectories from web agents built on four LLMs (Claude 3.7, GPT-4o, Llama 3.3, Qwen2.5-VL) across popular web benchmarks (AssistantBench, WebArena, VWA, WorkArena, WorkArena++). An amazing team effort with: @a_kazemnejad @ncmeade @arkil_patel @dcshin718 @alejaz_a @karstanczak @ptshaw2 @chrisjpal @sivareddyg

English

1

7

34

4.2K

Nicholas Meade retweetledi

Alexander Panfilov@kotekjedi_ml·23 Eyl

🚨 New paper! LLMs, when asked harmful questions, sometimes produce outputs that look helpful (and harmful) — but are actually 𝗱𝗲𝗹𝗶𝗯𝗲𝗿𝗮𝘁𝗲𝗹𝘆 𝘄𝗿𝗼𝗻𝗴 What’s bad - current LLM-based jailbreak scorers can’t tell the difference (me neither) More in 🧵👇

English

4

16

97

16.1K

Nicholas Meade@ncmeade·4 Eyl

If you're interested in working on agent safety (and are a student in Canada) you should apply to this! @gspandana is extremely smart and one of the kindest people I've gotten to work with

Spandana Gella@gspandana

Internship @ServiceNowRSRCH to build the next generation of computer use agents that are safe and secure from malicious attacks. Focus on intervention strategies, defenses to make agents robust against unsafe behavior.. Apply here: bit.ly/3V3mmTg

English

0

6

253

Nicholas Meade retweetledi

Maksym Andriushchenko@maksym_andr·6 Ağu

🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start in Fall 2025 (i.e., as soon as possible) and through centralized programs like CLS, IMPRS, and ELLIS (the deadlines are in November) to start in Spring–Fall 2026. I'm also searching for postdocs, master's thesis students, and research interns. Fill the Google form below if you're interested! Research group. We will focus on developing algorithmic solutions to reduce harms from advanced general-purpose AI models. We're particularly interested in alignment of autonomous LLM agents, which are becoming increasingly capable and pose a variety of emerging risks. We're also interested in rigorous AI evaluations and informing the public about the risks and capabilities of frontier AI models. Additionally, we aim to advance our understanding of how AI models generalize, which is crucial for ensuring their steerability and reducing associated risks. For more information about research topics relevant to our group, please check the following documents: - International AI Safety Report, - An Approach to Technical AGI Safety and Security by DeepMind, - Open Philanthropy’s 2025 RFP for Technical AI Safety Research. Research style. We are not necessarily interested in getting X papers accepted at NeurIPS/ICML/ICLR. We are interested in making an impact: this can be papers (and NeurIPS/ICML/ICLR are great venues), but also open-source repositories, benchmarks, blog posts, even social media posts—literally anything that can be genuinely useful for other researchers and the general public. Broader vision. Current machine learning methods are fundamentally different from what they used to be pre-2022. The Bitter Lesson summarized and predicted this shift very well back in 2019: "general methods that leverage computation are ultimately the most effective". Taking this into account, we are only interested in studying methods that are general and scale with intelligence and compute. Everything that helps to advance their safety and alignment with societal values is relevant to us. We believe getting this—some may call it "AGI"—right is one of the most important challenges of our time. Join us on this journey!

English

76

90

843

105.6K

Nicholas Meade retweetledi

Siva Reddy@sivareddyg·31 Tem

What's the path to scalable and safe web agents? Is web agents the new semantic parsing? I will be giving a talk at the ACL REALM workshop today at 9:30 am. Come check out if you are interested in the history and contemporary work in this area. Lot of other exciting speakers. #ACL2025 #ACL2025NLP x.com/nouhadziri/sta…

English

1

10

39

2.5K

Nicholas Meade@ncmeade·28 Tem

Come by our #ACL2025 poster tomorrow to discuss the safety risks surrounding increasingly capable instruction-following retrievers (or anything safety related)! 16:00-17:30 on Tuesday in Hall 4/5

Parishad BehnamGhader@ParishadBehnam

Come and visit our poster on the Safety of Retrievers @aclmeeting 🗓️Tuesday, Findings Posters, 16:00-17:30 🚨Instruction-following retrievers will become increasingly good tools for searching for harmful or sensitive information.🚨

English

0

4

15

977

Nicholas Meade retweetledi

Shruti Joshi@_shruti_joshi_·17 Tem

I will be at the Actionable Interpretability Workshop (@ActInterp, #ICML) presenting *SSAEs* in the East Ballroom A from 1-2pm. Drop by (or send a DM) to chat about (actionable) interpretability, (actionable) identifiability, and everything in between!

Shruti Joshi@_shruti_joshi_

1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time! Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. 🧶

English

1

6

25

2.7K

Nicholas Meade@ncmeade·15 Tem

Paper: arxiv.org/abs/2503.04957 Code: github.com/McGill-NLP/saf… Project Website: safearena.github.io Work w/ Ada Defne Tur, @xhluca, Alejandra Zambrano, @arkil_patel, @esindurmusnlp, @gspandana, @karstanczak, & @sivareddyg

English

0

7

231

Nicholas Meade@ncmeade·15 Tem

I'll be at #ICML2025 this week presenting SafeArena (Wednesday 11AM - 1:30PM in East Exhibition Hall E-701). Come by to chat with me about web agent safety (or anything else safety-related)!

English

1

12

35

5.6K

Nicholas Meade

Keşfet