Ekaterina Lobacheva

153 posts

Ekaterina Lobacheva banner
Ekaterina Lobacheva

Ekaterina Lobacheva

@KateLobacheva

Postdoc @Mila_Quebec @UMontreal Like to explain unexpected behavior of neural nets 🤯

Montréal, Québec Katılım Aralık 2018
443 Takip Edilen631 Takipçiler
Sabitlenmiş Tweet
Ekaterina Lobacheva
Ekaterina Lobacheva@KateLobacheva·
Happy to be one of the organizers of the ICML Workshop on Weight-Space Symmetries 🥳 Submit your work by April 24! #weightsymmetry2026 #ICML2026
Weight Space Symmetries @ ICML 2026@weightsymmetry

📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026

English
0
1
8
956
Ekaterina Lobacheva retweetledi
Ekaterina Lobacheva retweetledi
Peter Romov
Peter Romov@romovpa·
Autoresearch can discover SOTA white-box adversarial attacks on LLMs. We gave Claude 30+ existing GCG-like algorithms and access to a compute cluster, and it quickly learned to combine them into new methods that outperform all existing ones. Here’s what that looks like:
Peter Romov tweet media
English
2
21
104
13.4K
Ekaterina Lobacheva retweetledi
Nadia Chirkova
Nadia Chirkova@nadiinchi·
I will be presenting LLM-as-a-qualitative-judge at the MME workshop @ #EACL2026! Will be happy to chat about the importance of analysing frequent error cases in NLP applications and how we attempt to automate it in LLM-as-a-qualitative-judge! Mar 28, 16:50, poster hall #NLProc
Nadia Chirkova tweet media
English
2
1
7
348
Ekaterina Lobacheva retweetledi
Boris Hanin
Boris Hanin@BorisHanin·
🚨 2026 @Princeton ML Theory Summer School Meet your peers Learn from mini-courses by: - Subhabrata Sen - Lenaic Chizat - Sinho Chewi - Elliot Paquette - Elad Hazan - Surya Ganguli August 3 - 14, 2026 One week left to apply! Link 👇 Sponsors: @NSF, @PrincetonAInews, @EPrinceton, @JaneStreetGroup, @DARPA, @PrincetonPLI, Princeton NAM, Princeton AI2, Princeton PACM Some amazing speakers from this and previous years: @subhabratasen90, @LenaicChizat, @poseypaquet, @HazanPrinceton, @SuryaGanguli, @Andrea__M, @TheodorMisiakie, @KrzakalaF, @_brloureiro, @rakhlin, @DimaKrotov, @CPehlevan, @SoledadVillar5, @SebastienBubeck, @tengyuma
English
4
20
132
36.6K
Ekaterina Lobacheva retweetledi
Antonio Orvieto
Antonio Orvieto@orvieto_antonio·
Optimization theory for adaptive methods actually predicts most of what we know about hyperparameter scaling in LLM pretraining, and suggests new strategies as well. We did a deep dive here.
Antonio Orvieto tweet media
English
10
68
567
115.4K
Ekaterina Lobacheva retweetledi
Peter Hase
Peter Hase@peterbhase·
New Schmidt Sciences RFP on AI Interpretability: We need new tools for detecting and mitigating deceptive behaviors exhibited by LLMs. Funding for $300k-$1M projects Deadline: May 26th, AoE RFP: schmidtsciences.smapply.io/prog/2026_inte… Please share with anyone who may be interested!
English
1
35
173
12.3K
Ekaterina Lobacheva retweetledi
Alessandro Salvatore
Alessandro Salvatore@AleSalvatore00·
Why can't we solve adversarial examples? After a decade of work, neural nets still get fooled by imperceptible noise. We think we finally know the geometric reason why — and it connects to AI alignment. 🧵
Alessandro Salvatore tweet media
English
17
91
803
69.5K
Ekaterina Lobacheva retweetledi
Vaibhav Adlakha
Vaibhav Adlakha@vaibhav_adlakha·
Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning
GIF
English
4
36
192
49.3K
Ekaterina Lobacheva retweetledi
Nathan Godey
Nathan Godey@nthngdy·
🧵New paper: "Lost in Backpropagation: The LM Head is a Gradient Bottleneck" The output layer of LLMs destroys 95-99% of your training signal during backpropagation, and this significantly slows down pretraining 👇
Nathan Godey tweet media
English
27
106
951
115.8K
Ekaterina Lobacheva retweetledi
Yizhou Liu
Yizhou Liu@YizhouLiu0·
💡Neural Scaling Laws Trilogy: Superposition yields 1/width law, averaging yields 1/depth law, and low-entropy universality yields 1/3-time law. At the optimal shape, Chinchilla scaling laws can be explained. Improvements in scaling are hypothesized. 👉liuyz0.github.io/blog/2026/NSLT/
GIF
English
4
31
129
11.1K
Ekaterina Lobacheva retweetledi
Benjamin Thérien
Benjamin Thérien@benjamintherien·
Are frontier LLMs trained across datacenters? One thing is certain: if the pre-training optimizer’s critical batch size is too small, they are NOT! Excited to announce MuLoCo, a pre-training optimizer that can efficiently pre-train across datacenters while having large enough batch sizes to warrant doing so. 🧵1/N
Benjamin Thérien tweet media
English
3
33
94
16.8K
Ekaterina Lobacheva retweetledi
Chandar Lab
Chandar Lab@ChandarLab·
Streaming Reinforcement Learning (RL) is a huge challenge: transitions are used once and discarded immediately. This makes agents extremely sample-inefficient. But what if we could "squeeze" more information out of every single frame? Check out our latest paper!
Chandar Lab tweet media
English
1
12
15
2.8K
Ekaterina Lobacheva retweetledi
Damien Teney
Damien Teney@DamienTeney·
🔥What if web text isn’t the best place to start training LLMs? Our latest work shows that warming up models on procedural data (e.g. from formal languages & simple algorithms) speeds up subsequent pretraining on language, code, and math, on models up to 1.3B parameters⬇️🧵
Damien Teney tweet media
English
4
16
77
5.6K
Ekaterina Lobacheva retweetledi
tom
tom@tvergarabrowne·
first paper of the phd 🥳 the Superficial Alignment Hypothesis (SAH) argues that pre-training adds most of the knowledge to a model, and post-training merely surfaces it. however, this hypothesis has lacked a precise definition. we fix this.
tom tweet media
English
9
45
235
31.2K
Ekaterina Lobacheva retweetledi
Damien Ferbach
Damien Ferbach@damien_ferbach·
1/10 We built ADANA, an optimizer that gets better as you scale. It extends AdamW with log-time schedules for momentum and weight decay — same hyperparameter count, no extra engineering. Scaled from 45M to 2.6B, it saves ~40% compute vs tuned AdamW, and the gap keeps growing.🧵
Damien Ferbach tweet media
English
3
38
184
33.1K
Ekaterina Lobacheva retweetledi
Chandar Lab
Chandar Lab@ChandarLab·
‘The Markovian Thinker’, developed by our lab, has been accepted at @iclr_conf! 

This work achieved long reasoning without the quadratic attention tax LLMs reason in chunks with a bounded state, achieving linear compute, constant memory and scaling beyond its training limit!
GIF
English
1
18
67
8.6K
Ekaterina Lobacheva retweetledi
Chandar Lab
Chandar Lab@ChandarLab·
New work from our lab, accepted @iclr_conf : "The Expressive Limits of Diagonal SSMs for State-Tracking" We give a complete characterization of what diagonal SSMs can and cannot compute on state-tracking tasks and the answer is deeply connected to group theory. 🧵👇
English
2
13
25
4.4K