Gregor Bachmann

121 posts

Gregor Bachmann

@GregorBachmann1

I am a PhD student @ETH Zürich working on deep learning. MLP-pilled 💊. https://t.co/yWdDEV6Z15

Katılım Mayıs 2022

402 Takip Edilen379 Takipçiler

Sabitlenmiş Tweet

Gregor Bachmann@GregorBachmann1·3 Eki

Very thrilled to announce that our work "Scaling MLPs" has been accepted at NeurIPS 🥳 Check out our new Arxiv version arxiv.org/abs/2306.13575 ! @SAnagnostidis and I managed to push performance even further 🔥

AK@_akhaliq

Scaling MLPs: A Tale of Inductive Bias paper page: huggingface.co/papers/2306.13… In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are important for multiple reasons. (1) Given the recent narrative "less inductive bias is better", popularized due to transformers eclipsing convolutional models, it is natural to explore the limits of this hypothesis. To that end, MLPs offer an ideal test bed, being completely free of any inductive bias. (2) MLPs have almost exclusively been the main protagonist in the deep learning theory literature due to their mathematical simplicity, serving as a proxy to explain empirical phenomena observed for more complex architectures. Surprisingly, experimental datapoints for MLPs are very difficult to find in the literature, especially when coupled with large pre-training protocols. This discrepancy between practice and theory is worrying: Do MLPs reflect the empirical advances exhibited by practical models? Or do theorists need to rethink the role of MLPs as a proxy? We provide insights into both these aspects. We show that the performance of MLPs drastically improves with scale (93% on CIFAR10, 79% on CIFAR100, 69% on TinyImageNet), highlighting that lack of inductive bias can indeed be compensated. We observe that MLPs mimic the behaviour of their modern counterparts faithfully, with some components in the learning setting however surprisingly exhibiting stronger or unexpected behaviours. Due to their inherent computational efficiency, large pre-training experiments become more accessible for academic researchers. All of our experiments were run on a single GPU.

English

20K

Gregor Bachmann retweetledi

Lazarz@Laz4rz·16 Şub

Lazarz@Laz4rz

That’s the AR problem, not only RL

ZXX

9.1K

Gregor Bachmann retweetledi

François Fleuret@francoisfleuret·8 Ara

Had a chat at neurips with @_vaishnavh about the failure of next-token prediction + teacher forcing, and he has this wonderful minimal synthetic problem that IMO encompasses all the problem with / reason for "reasoning" 1/5

English

201

39.2K

Gregor Bachmann retweetledi

Tiago Pimentel@tpimentelms·31 Tem

Honoured to receive two (!!) Senior Area Chair awards at #ACL2025 😁 (Conveniently placed on the same slide!) With the amazing Philip Whittington, @GregorBachmann1 and @weGotlieb, @CuiDing_CL, Giovanni Acampa, @a_stadt, @tamaregev

ACL 2026@aclmeeting

English

3.6K

Gregor Bachmann retweetledi

Vaishnavh Nagarajan@_vaishnavh·16 Tem

Today @ChenHenryWu and I will be presenting our #ICML work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT Or please stop by our poster right after @ East Exhibition Hall A-B #E-2505 11am-1:30pm. (Hope you enjoy some silly human drawings!)

English

7.2K

Gregor Bachmann retweetledi

Ayça Takmaz@aycatakmaz·10 Tem

Can we learn to complete anything in Lidar without any manual supervision? Excited to share our #ICML2025 paper “Towards Learning to Complete Anything in Lidar” from my time at @nvidia with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @AljosaOsep @lealtaixe! Thread🧵👇

English

5.4K

Gregor Bachmann retweetledi

Edward Milsom@edward_milsom·13 Haz

What's some "must read" literature on generalisation in neural networks? I keep thinking about this paper and it really makes me want to understand better the link between optimisation and generalisation. arxiv.org/abs/2302.12091

English

225

18.2K

Gregor Bachmann retweetledi

Ayça Takmaz@aycatakmaz·12 Haz

Our workshop on open-world 3D scene understanding OpenSUN3D is taking place this afternoon at @CVPR!

Elisabetta Fedele@efedele16

Join us at OpenSUN3D☀️ workshop this afternoon @CVPR 🚀 📍: Room 105 A 🕰️: 2:00-6:00 pm 🌍: opensun3d.github.io @afshin_dn @leto__jean @lealtaixe

English

3.3K

Gregor Bachmann retweetledi

Vaishnavh Nagarajan@_vaishnavh·2 Haz

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

English

168

29.4K

Gregor Bachmann retweetledi

Spyros Gidaris@SpyrosGidaris·22 May

Better LLM training? @GregorBachmann1 & @_vaishnavh showed next-token prediction causes shortcut learning. A fix? Multi-token prediction training (thanks @FabianGloeckle) We use register tokens: minimal architecture changes & scalable prediction horizons x.com/NasosGer/statu…

Anastasios Gerontopoulos@NasosGer

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful

English

968

Gregor Bachmann retweetledi

Vaishnavh Nagarajan@_vaishnavh·13 May

@francoisfleuret Hey @francoisfleuret, we had formalized this very intuition here in this late-2023 work you may be interested in :-) arxiv.org/abs/2403.06963

English

642

Gregor Bachmann retweetledi

Ayça Takmaz@aycatakmaz·17 Nis

Thanks @_akhaliq for sharing! During my internship at @NVIDIAAI, we explored zero-shot panoptic completion of Lidar scans — together with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @lealtaixe @AljosaOsep!

AK@_akhaliq

Nvidia just announced Towards Learning to Complete Anything in Lidar

English

13K

Gregor Bachmann retweetledi

AK@_akhaliq·17 Nis

Nvidia just announced Towards Learning to Complete Anything in Lidar

English

403

62.9K

Gregor Bachmann retweetledi

Dimitri von Rütte@dvruette·10 Mar

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

GIF

English

154

1.1K

143.5K

Gregor Bachmann retweetledi

Ayça Takmaz@aycatakmaz·16 Oca

I will be giving a talk on open-vocabulary 3D scene understanding at the next ZurichCV meetup! 🗓️ Date: Thursday, January 23rd 18:00 📍Location: @ETH_AI_Center, please see zurichai.ch/events/zurichc… for additional details!

English

3.7K

Gregor Bachmann retweetledi

Ayça Takmaz@aycatakmaz·24 Ara

Join us for the 4th edition of ☀️OpenSUN3D🌎 workshop on open-world 3D scene understanding at #CVPR2025! We will explore emerging trends in 3D scene understanding, and applications of language models in 3D vision. We're also hosting a challenge! 📚 opensun3d.github.io

Francis Engelmann@FrancisEngelman

Get ready for the next @CVPR workshop on OpenWorld 3D Scene Understanding ➡️ opensun3d.github.io We will be hosting: - prized challenge 🏆 (see scenefun3d.github.io) - paper track 🗞️ - exciting keynote speakers 👩‍🏫 #CVPR2025

English

1.2K

Gregor Bachmann@GregorBachmann1·23 Ara

@LukaszJDebowski @tpimentelms Thanks for checking out our work! We indeed discuss how these works relate to our results (and straight-line programs in general) in section 7. We were not aware of your work; thanks for bringing it to our attention, we will read it carefully!

English

Łukasz Dębowski@LukaszJDebowski·22 Ara

@tpimentelms @GregorBachmann1 I wonder if you relate to much earlier work in grammar based coding and the smallest grammar problem: Kieffer and Yang 2000, Charikar et al 2005, Dębowski 2011. 🙂

English

200

Gregor Bachmann retweetledi

Tiago Pimentel@tpimentelms·20 Ara

BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a very difficult—in fact, NP-complete—problem!🤯 New paper + P. Whittington, @GregorBachmann1 :) arxiv.org/abs/2412.15210

English

429

35.8K

Gregor Bachmann retweetledi

Enis Simsar@enisimsar·13 Ara

🚀 Excited to share our preprint LoRACLR! TL;DR: LoRACLR merges multiple LoRA models into a unified diffusion model for seamless, high-fidelity multi-concept image synthesis with minimal interference. Thanks to @THofmann2017, @fedassa, and @PINguAR! 🙌

English

7.8K

Gregor Bachmann retweetledi

Bobby@bobby_he·13 Ara

Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!

Bobby@bobby_he

Updated camera ready arxiv.org/abs/2405.19279. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!

English

5.4K

Gregor Bachmann@GregorBachmann1·10 Eki

@neurallambda @dvruette Yes, tokenization is one issue that it overcomes pretty easily somehow! o1 is still doing something interesting imo, GPT4o randomly guesses between David and Victoria and never gives the correct reasoning (capital cities). Navigating the embeddings still seems crucial.

English

neurallambda@neurallambda·10 Eki

@dvruette @GregorBachmann1 A "have in common problem" is typically very easy given the way embeddings, and latents superpose lots of info that can then interact But your example overcomes pretty severe tokenization issues, crazy. During thinking it must have spelled it out, but still?!

English

Dimitri von Rütte@dvruette·10 Eki

consider me impressed! odds of this being in the training set are very low

English

7.5K

Keşfet

@_vaishnavh @weGotlieb @CuiDing_CL @a_stadt @tamaregev @ChenHenryWu @nvidia @CristianoSalto