Gregor Bachmann

121 posts

Gregor Bachmann banner
Gregor Bachmann

Gregor Bachmann

@GregorBachmann1

I am a PhD student @ETH Zürich working on deep learning. MLP-pilled 💊. https://t.co/yWdDEV6Z15

Katılım Mayıs 2022
402 Takip Edilen379 Takipçiler
Sabitlenmiş Tweet
Gregor Bachmann
Gregor Bachmann@GregorBachmann1·
Very thrilled to announce that our work "Scaling MLPs" has been accepted at NeurIPS 🥳 Check out our new Arxiv version arxiv.org/abs/2306.13575 ! @SAnagnostidis and I managed to push performance even further 🔥
AK@_akhaliq

Scaling MLPs: A Tale of Inductive Bias paper page: huggingface.co/papers/2306.13… In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are important for multiple reasons. (1) Given the recent narrative "less inductive bias is better", popularized due to transformers eclipsing convolutional models, it is natural to explore the limits of this hypothesis. To that end, MLPs offer an ideal test bed, being completely free of any inductive bias. (2) MLPs have almost exclusively been the main protagonist in the deep learning theory literature due to their mathematical simplicity, serving as a proxy to explain empirical phenomena observed for more complex architectures. Surprisingly, experimental datapoints for MLPs are very difficult to find in the literature, especially when coupled with large pre-training protocols. This discrepancy between practice and theory is worrying: Do MLPs reflect the empirical advances exhibited by practical models? Or do theorists need to rethink the role of MLPs as a proxy? We provide insights into both these aspects. We show that the performance of MLPs drastically improves with scale (93% on CIFAR10, 79% on CIFAR100, 69% on TinyImageNet), highlighting that lack of inductive bias can indeed be compensated. We observe that MLPs mimic the behaviour of their modern counterparts faithfully, with some components in the learning setting however surprisingly exhibiting stronger or unexpected behaviours. Due to their inherent computational efficiency, large pre-training experiments become more accessible for academic researchers. All of our experiments were run on a single GPU.

English
0
6
57
20K
Gregor Bachmann retweetledi
François Fleuret
François Fleuret@francoisfleuret·
Had a chat at neurips with @_vaishnavh about the failure of next-token prediction + teacher forcing, and he has this wonderful minimal synthetic problem that IMO encompasses all the problem with / reason for "reasoning" 1/5
English
6
8
201
39.2K
Gregor Bachmann retweetledi
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
Today @ChenHenryWu and I will be presenting our #ICML work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT Or please stop by our poster right after @ East Exhibition Hall A-B #E-2505 11am-1:30pm. (Hope you enjoy some silly human drawings!)
Vaishnavh Nagarajan tweet media
English
1
20
92
7.2K
Gregor Bachmann retweetledi
Edward Milsom
Edward Milsom@edward_milsom·
What's some "must read" literature on generalisation in neural networks? I keep thinking about this paper and it really makes me want to understand better the link between optimisation and generalisation. arxiv.org/abs/2302.12091
English
5
29
225
18.2K
Gregor Bachmann retweetledi
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
Vaishnavh Nagarajan tweet media
English
1
42
168
29.4K
Gregor Bachmann retweetledi
Gregor Bachmann retweetledi
AK
AK@_akhaliq·
Nvidia just announced Towards Learning to Complete Anything in Lidar
English
9
54
403
62.9K
Gregor Bachmann retweetledi
Dimitri von Rütte
Dimitri von Rütte@dvruette·
🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12
GIF
English
21
154
1.1K
143.5K
Gregor Bachmann retweetledi
Ayça Takmaz
Ayça Takmaz@aycatakmaz·
I will be giving a talk on open-vocabulary 3D scene understanding at the next ZurichCV meetup! 🗓️ Date: Thursday, January 23rd 18:00 📍Location: @ETH_AI_Center, please see zurichai.ch/events/zurichc… for additional details!
English
0
9
45
3.7K
Gregor Bachmann retweetledi
Ayça Takmaz
Ayça Takmaz@aycatakmaz·
Join us for the 4th edition of ☀️OpenSUN3D🌎 workshop on open-world 3D scene understanding at #CVPR2025! We will explore emerging trends in 3D scene understanding, and applications of language models in 3D vision. We're also hosting a challenge! 📚 opensun3d.github.io
Francis Engelmann@FrancisEngelman

Get ready for the next @CVPR workshop on OpenWorld 3D Scene Understanding ➡️ opensun3d.github.io We will be hosting: - prized challenge 🏆 (see scenefun3d.github.io) - paper track 🗞️ - exciting keynote speakers  👩‍🏫 #CVPR2025

English
0
2
14
1.2K
Gregor Bachmann
Gregor Bachmann@GregorBachmann1·
@LukaszJDebowski @tpimentelms Thanks for checking out our work! We indeed discuss how these works relate to our results (and straight-line programs in general) in section 7. We were not aware of your work; thanks for bringing it to our attention, we will read it carefully!
English
1
0
2
31
Łukasz Dębowski
Łukasz Dębowski@LukaszJDebowski·
@tpimentelms @GregorBachmann1 I wonder if you relate to much earlier work in grammar based coding and the smallest grammar problem: Kieffer and Yang 2000, Charikar et al 2005, Dębowski 2011. 🙂
English
1
0
3
200
Gregor Bachmann retweetledi
Tiago Pimentel
Tiago Pimentel@tpimentelms·
BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a very difficult—in fact, NP-complete—problem!🤯 New paper + P. Whittington, @GregorBachmann1 :) arxiv.org/abs/2412.15210
English
6
78
429
35.8K
Gregor Bachmann retweetledi
Enis Simsar
Enis Simsar@enisimsar·
🚀 Excited to share our preprint LoRACLR! TL;DR: LoRACLR merges multiple LoRA models into a unified diffusion model for seamless, high-fidelity multi-concept image synthesis with minimal interference. Thanks to @THofmann2017, @fedassa, and @PINguAR! 🙌
Enis Simsar tweet media
English
3
5
28
7.8K
Gregor Bachmann retweetledi
Bobby
Bobby@bobby_he·
Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!
Bobby tweet media
Bobby@bobby_he

Updated camera ready arxiv.org/abs/2405.19279. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!

English
0
10
46
5.4K
Gregor Bachmann
Gregor Bachmann@GregorBachmann1·
@neurallambda @dvruette Yes, tokenization is one issue that it overcomes pretty easily somehow! o1 is still doing something interesting imo, GPT4o randomly guesses between David and Victoria and never gives the correct reasoning (capital cities). Navigating the embeddings still seems crucial.
English
0
0
1
17
neurallambda
neurallambda@neurallambda·
@dvruette @GregorBachmann1 A "have in common problem" is typically very easy given the way embeddings, and latents superpose lots of info that can then interact But your example overcomes pretty severe tokenization issues, crazy. During thinking it must have spelled it out, but still?!
English
1
0
2
30
Dimitri von Rütte
Dimitri von Rütte@dvruette·
consider me impressed! odds of this being in the training set are very low
Dimitri von Rütte tweet media
English
2
0
29
7.5K