Ekaterina Lobacheva

153 posts

Ekaterina Lobacheva

@KateLobacheva

Postdoc @Mila_Quebec @UMontreal Like to explain unexpected behavior of neural nets 🤯

Montréal, Québec Katılım Aralık 2018

443 Takip Edilen631 Takipçiler

Sabitlenmiş Tweet

Ekaterina Lobacheva@KateLobacheva·13 Tem

Our new paper got #ACL2025 oral! 🎉 If you're interested in LLM training dynamics, its phases, and how scaling affects them — check it out! @Mila_Quebec x.com/mirandrom/stat…

Andrei Mircea@mirandrom

Step 1: Understand how scaling improves LLMs. Step 2: Directly target underlying mechanism. Step 3: Improve LLMs independent of scale. Profit. In our ACL 2025 paper we look at Step 1 in terms of training dynamics. Project: mirandrom.github.io/zsl Paper: arxiv.org/pdf/2506.05447

English

3.3K

Ekaterina Lobacheva@KateLobacheva·18h

Happy to be one of the organizers of the ICML Workshop on Weight-Space Symmetries 🥳 Submit your work by April 24! #weightsymmetry2026 #ICML2026

Weight Space Symmetries @ ICML 2026@weightsymmetry

📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026

English

956

Ekaterina Lobacheva@KateLobacheva·2d

This is wild

euan ashley@euanashley

New AI paper from us this week. When my student first showed me his initial findings, I really didn’t know what to make of them. I felt that this was an interesting but curious loophole phenomenon that would shortly be closed. I was very wrong. arxiv.org/abs/2603.21687

English

438

Ekaterina Lobacheva retweetledi

Maksym Andriushchenko@maksym_andr·4d

Jonas brings up an interesting general point: if we can automate incremental research using agents, will we, as a community, have more time/incentive to work on larger problems that matter? Or will it lead to a flood of incremental papers submitted to ML conferences?

Jonas Geiping@jonasgeiping

Can we automate incremental research in a narrow AI security subfield? We ran Claude in a loop to improve GCG-style attacks against LLMs (search algorithms that produce attack strings that jailbreak/prompt inject). With the right setup, this is quite powerful, and we discover algorithms that generalize *better* across unseen models than prior work (~30 papers from the last 2 years) ! ----- Many more details quoted below: ----- Overall this was a super interesting project, and it will make me think a lot more about the meta-science implications of more widespread attempts at 'automating all incremental research'.

English

2.6K

Ekaterina Lobacheva retweetledi

Peter Romov@romovpa·5d

Autoresearch can discover SOTA white-box adversarial attacks on LLMs. We gave Claude 30+ existing GCG-like algorithms and access to a compute cluster, and it quickly learned to combine them into new methods that outperform all existing ones. Here’s what that looks like:

English

104

13.4K

Ekaterina Lobacheva retweetledi

Nadia Chirkova@nadiinchi·4d

I will be presenting LLM-as-a-qualitative-judge at the MME workshop @ #EACL2026! Will be happy to chat about the importance of analysing frequent error cases in NLP applications and how we attempt to automate it in LLM-as-a-qualitative-judge! Mar 28, 16:50, poster hall #NLProc

English

348

Ekaterina Lobacheva retweetledi

Boris Hanin@BorisHanin·6d

🚨 2026 @Princeton ML Theory Summer School Meet your peers Learn from mini-courses by: - Subhabrata Sen - Lenaic Chizat - Sinho Chewi - Elliot Paquette - Elad Hazan - Surya Ganguli August 3 - 14, 2026 One week left to apply! Link 👇 Sponsors: @NSF, @PrincetonAInews, @EPrinceton, @JaneStreetGroup, @DARPA, @PrincetonPLI, Princeton NAM, Princeton AI2, Princeton PACM Some amazing speakers from this and previous years: @subhabratasen90, @LenaicChizat, @poseypaquet, @HazanPrinceton, @SuryaGanguli, @Andrea__M, @TheodorMisiakie, @KrzakalaF, @_brloureiro, @rakhlin, @DimaKrotov, @CPehlevan, @SoledadVillar5, @SebastienBubeck, @tengyuma

English

132

36.6K

Ekaterina Lobacheva retweetledi

Antonio Orvieto@orvieto_antonio·23 Mar

Optimization theory for adaptive methods actually predicts most of what we know about hyperparameter scaling in LLM pretraining, and suggests new strategies as well. We did a deep dive here.

English

567

115.4K

Ekaterina Lobacheva retweetledi

Peter Hase@peterbhase·17 Mar

New Schmidt Sciences RFP on AI Interpretability: We need new tools for detecting and mitigating deceptive behaviors exhibited by LLMs. Funding for $300k-$1M projects Deadline: May 26th, AoE RFP: schmidtsciences.smapply.io/prog/2026_inte… Please share with anyone who may be interested!

English

173

12.3K

Ekaterina Lobacheva retweetledi

Alessandro Salvatore@AleSalvatore00·17 Mar

Why can't we solve adversarial examples? After a decade of work, neural nets still get fooled by imperceptible noise. We think we finally know the geometric reason why — and it connects to AI alignment. 🧵

English

803

69.5K

Ekaterina Lobacheva retweetledi

Vaibhav Adlakha@vaibhav_adlakha·12 Mar

Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning

GIF

English

192

49.3K

Ekaterina Lobacheva retweetledi

Nathan Godey@nthngdy·12 Mar

🧵New paper: "Lost in Backpropagation: The LM Head is a Gradient Bottleneck" The output layer of LLMs destroys 95-99% of your training signal during backpropagation, and this significantly slows down pretraining 👇

English

106

951

115.8K

Ekaterina Lobacheva@KateLobacheva·5 Mar

Very nice realistic continual learning benchmark on protein sequences: new data appears over time, old data gets refined, creating natural non-stationary dynamics!

Darshan Patil@dapatil211

🧬 New paper Scientific datasets evolve as science evolves. With proteins, new sequences get added, annotations get corrected, and noisy entries get curated out. Introducing CoPeP, a continual-pretraining benchmark for protein LMs. Details 🧵 1/n

English

1.1K

Ekaterina Lobacheva retweetledi

Yizhou Liu@YizhouLiu0·1 Mar

💡Neural Scaling Laws Trilogy: Superposition yields 1/width law, averaging yields 1/depth law, and low-entropy universality yields 1/3-time law. At the optimal shape, Chinchilla scaling laws can be explained. Improvements in scaling are hypothesized. 👉liuyz0.github.io/blog/2026/NSLT/

GIF

English

129

11.1K

Ekaterina Lobacheva retweetledi

Benjamin Thérien@benjamintherien·26 Şub

Are frontier LLMs trained across datacenters? One thing is certain: if the pre-training optimizer’s critical batch size is too small, they are NOT! Excited to announce MuLoCo, a pre-training optimizer that can efficiently pre-train across datacenters while having large enough batch sizes to warrant doing so. 🧵1/N

English

16.8K

Ekaterina Lobacheva retweetledi

Chandar Lab@ChandarLab·24 Şub

Streaming Reinforcement Learning (RL) is a huge challenge: transitions are used once and discarded immediately. This makes agents extremely sample-inefficient. But what if we could "squeeze" more information out of every single frame? Check out our latest paper!

English

2.8K

Ekaterina Lobacheva retweetledi

Damien Teney@DamienTeney·18 Şub

🔥What if web text isn’t the best place to start training LLMs? Our latest work shows that warming up models on procedural data (e.g. from formal languages & simple algorithms) speeds up subsequent pretraining on language, code, and math, on models up to 1.3B parameters⬇️🧵

English

5.6K

Ekaterina Lobacheva retweetledi

tom@tvergarabrowne·18 Şub

first paper of the phd 🥳 the Superficial Alignment Hypothesis (SAH) argues that pre-training adds most of the knowledge to a model, and post-training merely surfaces it. however, this hypothesis has lacked a precise definition. we fix this.

English

235

31.2K

Ekaterina Lobacheva retweetledi

Damien Ferbach@damien_ferbach·19 Şub

1/10 We built ADANA, an optimizer that gets better as you scale. It extends AdamW with log-time schedules for momentum and weight decay — same hyperparameter count, no extra engineering. Scaled from 45M to 2.6B, it saves ~40% compute vs tuned AdamW, and the gap keeps growing.🧵

English

184

33.1K

Ekaterina Lobacheva retweetledi

Chandar Lab@ChandarLab·17 Şub

‘The Markovian Thinker’, developed by our lab, has been accepted at @iclr_conf!   This work achieved long reasoning without the quadratic attention tax LLMs reason in chunks with a bounded state, achieving linear compute, constant memory and scaling beyond its training limit!

GIF

English

8.6K

Ekaterina Lobacheva retweetledi

Chandar Lab@ChandarLab·10 Şub

New work from our lab, accepted @iclr_conf : "The Expressive Limits of Diagonal SSMs for State-Tracking" We give a complete characterization of what diagonal SSMs can and cannot compute on state-tracking tasks and the answer is deeply connected to group theory. 🧵👇

English

4.4K

Keşfet

@Princeton @NSF @PrincetonAInews @EPrinceton @JaneStreetGroup @DARPA @PrincetonPLI @subhabratasen90