EdinburghNLP

5

13

5.2K

EdinburghNLP retweetledi

Yuxiang Huang@yxyxyyy6·6h

[1/n] Can a model learn *where* and *how much* information it should attend to, and do so efficiently? We introduce DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention! This pushes the accuracy-efficiency frontier in LLMs.

GIF

English

Matteo Attimonelli@MattAttimonelli

13

62

12.3K

EdinburghNLP retweetledi

Pasquale Minervini@PMinervini·2d

Check out this new project by @MattAttimonelli et al.! TLDR -- multi-modal retrieval benchmarks can be solved by only using one modality, and we tried fixing this by creating manually curated subsets where all modalities are required

To address this, we introduce CIRCUS, a curated evaluation setup for testing genuine multimodal composition! 🌐 Website: matteoattimonelli.github.io/CIRCUS/ 📄 Paper: arxiv.org/pdf/2605.14787

English

2

14

1.7K

EdinburghNLP retweetledi

Matteo Attimonelli@MattAttimonelli·3d

Do multimodal retrieval benchmarks actually require multimodal reasoning?? We analyse Composed Image Retrieval, which should require models to combine visual and textual information:

English

Tianhao Cheng@tianhaoCheng_

4

8

1.1K

EdinburghNLP retweetledi

Edoardo Ponti@PontiEdoardo·6d

Critic-free RL (e.g. GRPO) is very effective in LLM post-training, but why? We propose the💥cancellation hypothesis💥: sequence-level rewards implicitly assign credits to individual tokens through the cancellation of gradients from pos/neg rollouts. x.com/crazycth0901/s…

🚀 The Cancellation Hypothesis in Critic-Free RL Conventional view: GRPO boosts successful rollouts and suppresses failed ones. We find Token Flipping: positive and negative rollouts show remarkably similar boosted/suppressed token ratios.

English

5

12

95

15.9K

EdinburghNLP retweetledi

Edoardo Ponti@PontiEdoardo·12 May

I am moving to @ICComputing at @imperialcollege as an associate professor, where I will be expanding my lab! I am looking for PhDs and postdocs to join me on my quest to build foundation models with adaptive tokenisation and memory (AToM FMs, funded by @ERC_Research)

English

21

19

208

12.8K

EdinburghNLP retweetledi

Mikołaj Piórczyński@AjPiorczynski·25 Nis

🇧🇷 Bom dia @iclr_conf ! This evening, together with @f_szatkowski, we’re presenting our paper “Universal Properties of Activation Sparsity in Modern Large Language Models”. Stop by Poster 912 in Pavilion 3 and let’s have a chat.

English

6

14

1.3K

EdinburghNLP retweetledi

Aryo Pradipta Gema@aryopg·21 Nis

Heading to Rio for #ICLR2026! 🇧🇷 Presenting 2 papers : 1. 𝐈𝐧𝐯𝐞𝐫𝐬𝐞 𝐒𝐜𝐚𝐥𝐢𝐧𝐠 𝐢𝐧 𝐓𝐞𝐬𝐭-𝐓𝐢𝐦𝐞 𝐂𝐨𝐦𝐩𝐮𝐭𝐞 (@TmlrOrg Featured Certification) — Sat Apr 25, 10:30 AM, Pavilion 3 (#903) 2. 𝐓𝐡𝐞 𝐇𝐨𝐭 𝐌𝐞𝐬𝐬 𝐨𝐟 𝐀𝐈: 𝐇𝐨𝐰 𝐃𝐨𝐞𝐬 𝐌𝐢𝐬𝐚𝐥𝐢𝐠𝐧𝐦𝐞𝐧𝐭 𝐒𝐜𝐚𝐥𝐞 𝐖𝐢𝐭𝐡 𝐌𝐨𝐝𝐞𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐚𝐧𝐝 𝐓𝐚𝐬𝐤 𝐂𝐨𝐦𝐩𝐥𝐞𝐱𝐢𝐭𝐲? — Sat Apr 25, 3:15 PM, Pavilion 3 (#110) Come find me if you're into LLM evaluation, Chain-of-Thought, and AI safety! 👋

English

Aryo Pradipta Gema@aryopg

5

64

5.3K

EdinburghNLP retweetledi

Pasquale Minervini@PMinervini·23 Nis

Aryo is amazing, catch up with him at ICLR'26! @iclrconf #ICLR2026

Heading to Rio for #ICLR2026! 🇧🇷 Presenting 2 papers : 1. 𝐈𝐧𝐯𝐞𝐫𝐬𝐞 𝐒𝐜𝐚𝐥𝐢𝐧𝐠 𝐢𝐧 𝐓𝐞𝐬𝐭-𝐓𝐢𝐦𝐞 𝐂𝐨𝐦𝐩𝐮𝐭𝐞 (@TmlrOrg Featured Certification) — Sat Apr 25, 10:30 AM, Pavilion 3 (#903) 2. 𝐓𝐡𝐞 𝐇𝐨𝐭 𝐌𝐞𝐬𝐬 𝐨𝐟 𝐀𝐈: 𝐇𝐨𝐰 𝐃𝐨𝐞𝐬 𝐌𝐢𝐬𝐚𝐥𝐢𝐠𝐧𝐦𝐞𝐧𝐭 𝐒𝐜𝐚𝐥𝐞 𝐖𝐢𝐭𝐡 𝐌𝐨𝐝𝐞𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐚𝐧𝐝 𝐓𝐚𝐬𝐤 𝐂𝐨𝐦𝐩𝐥𝐞𝐱𝐢𝐭𝐲? — Sat Apr 25, 3:15 PM, Pavilion 3 (#110) Come find me if you're into LLM evaluation, Chain-of-Thought, and AI safety! 👋

English

Filip Szatkowski@f_szatkowski

1

4

1.3K

EdinburghNLP retweetledi

Pasquale Minervini@PMinervini·23 Nis

To appear at ICLR'26 later this week! @iclrconf #ICLR2026

We are presenting "Universal Properties of Activation Sparsity in Modern Large Language Models" at ICLR 2026! We ask a simple question: how sparse are modern LLMs, really — and does it matter? 👇

English

Marcos Treviso@MarcosTreviso

2

9

1.9K

EdinburghNLP retweetledi

Edoardo Ponti@PontiEdoardo·22 Nis

We have just released AdaSplash 2, a highly efficient implementation of adaptively sparse attention! - Faster than FlashAttention 2 during training when block sparsity > 60% - More accurate than softmax attention on long-context benchmarks (+16 on HELMET ICL at 32k length)!

1/ We are excited to release AdaSplash-2 🚀 A big milestone from our lab on faster differentiable sparse attention. And honestly, one of my favorite examples of sparsity giving a real win-win: more efficiency + better downstream performance, especially for long-context tasks.

English

3

17

71

9.8K

EdinburghNLP retweetledi

Pasquale Minervini@PMinervini·24 Mar

If you are interested in privacy-preserving clinical NLP, we are recruiting a postdoc at @EdinburghUni ! The work is on LLMs/VLMs, AI privacy, and real-world health data in secure research environments. Apply by April 6th, 2026! More details here: elxw.fa.em3.oraclecloud.com/hcmUI/Candidat…

English

2

12

1.2K

EdinburghNLP retweetledi

Vivek Iyer@remorax98·21 Mar

Super excited to share my internship project at FAIR @AIatMeta 🚀 We introduce Spectrum -- an encoder-decoder LM pretrained using omnilingual & cross-modal sentence embeddings. Trained on English datasets alone, it outperforms strong baselines like Llama and SpiritLM on multilingual (900+ languages) and speech understanding benchmarks — despite never being directly exposed to multilingual or speech data during training. Curious how? Read on -- and check out the OmniSONAR technical report for the full details: ai.meta.com/research/publi… 👀🧵

English

10

47

4.3K

EdinburghNLP retweetledi

Pasquale Minervini@PMinervini·17 Mar

My amazing colleagues Sid and Michael are looking for a postdoc! 👇

ExLab@an_exlab

We are advertising a postdoc position to work on #generative #models, #structure #induction, and MI #estimation with Michael Gutmann as part of #GenAI hub! elxw.fa.em3.oraclecloud.com/hcmUI/Candidat… Get in touch! (#ML #AI) 👉 homepages.inf.ed.ac.uk/snaraya3/ 👉 michaelgutmann.github.io

English

Waylon Li @ ICLR2026 🇧🇷@li_waylon

2

1.9K

EdinburghNLP retweetledi

Yifu Qiu@ICLR 2026@yifuqiu98·3 Mar

Glad to see model steering in the spectral space works for attention and the long context as well! We also show that spectral editing of activations can steer model behavior to alleviate hallucination and bias! proceedings.neurips.cc/paper_files/pa…

🚀 Excited to share our paper "Spectral Attention Steering for Prompt Highlighting" has been accepted to ICLR 2026 and the camera-ready version is finally live! We’ve found a way to steer LLM attention that is actually effective, fast and compatible with modern hardware.

English

5

18

2.9K

EdinburghNLP retweetledi

Waylon Li @ ICLR2026 🇧🇷@li_waylon·3 Mar

🚀 Excited to share our paper "Spectral Attention Steering for Prompt Highlighting" has been accepted to ICLR 2026 and the camera-ready version is finally live! We’ve found a way to steer LLM attention that is actually effective, fast and compatible with modern hardware.

English

6

21

79

8.5K

EdinburghNLP retweetledi

Pasquale Minervini@PMinervini·2 Mar

MMLU-Redux is a manually curated/corrected version of MMLU--if a model does better at MMLU/MMLU-Pro and same/worse and MMLU-Redux, it's likely that the test set leaked in the training data 🙂 The @Alibaba_Qwen Qwen3.5 model family seems surprisingly strong, congrats to the team!

Ahmad@TheAhmadOsman

not only does the Qwen 3.5 9B beat the GPT OSS 20B it BEATS the 120B INCREDIBLE stuff

English

2

11

1.5K

EdinburghNLP retweetledi

Farooq Wani@wanifarooq848·26 Şub

Your VLM gives the same answer before and after a tiny image change. So it's robust, right? Wrong. In our new paper, we show that VLMs can preserve their predictions while their internal representations drift to regions normally occupied by completely unrelated images. 🧵👇

English

8

11

1.7K

EdinburghNLP retweetledi

Filip Szatkowski@f_szatkowski·23 Şub

We are presenting "Universal Properties of Activation Sparsity in Modern Large Language Models" at ICLR 2026! We ask a simple question: how sparse are modern LLMs, really — and does it matter? 👇

English

new blog! What methodologies do labs use to train frontier models? The blog distills 7 open-weight model reports from frontier labs, covering architecture, stability, optimizers, data curation, pre/mid/post-training + RL, and behaviors/safety djdumpling.github.io/2026/01/31/fro…

9

25

4.7K

EdinburghNLP retweetledi

Pasquale Minervini@PMinervini·21 Şub

Ok, we can safely say that one of @yuzhaouoe's first contributions as a @EdinburghNLP PhD student (intra-document causal masking -- arxiv.org/abs/2402.13991, ACL'24) is now a standard/mainstream industry practice 🙂🚀🚀🚀

Alex Wa@_djdumpling

English