

Steven De Keninck
402 posts





















Huge news... I've been working on something assiduously and secretly since last year. It's been several months in the making, with several rewrites and several re-edits. It's finally ready to be released, thus shadow dropping it now. It's a 3-hour iceberg on @ericweinstein's Geometric Unity. I work on understanding and explaining different Theories of Everything for a living, and this one is unlike any other you've seen. This iceberg covers the graduate-level math, but it also constantly provides explainers aimed at different levels for those who are uninitiated with physics and differential geometry. Enjoy.



My SIGGRAPH 2023 presentation of "Winding Numbers on Discrete Surfaces", authored with @MarkGillespie64 and @keenanisalive , is now on YouTube: youtu.be/QnMx3s4_4WY


The best paper award for the Algebra and Geometry track goes to "Does Equivariance Matter at Scale?" presented by Johann Brehmer.


Does equivariance matter at scale? ... When the twitter discourse gets so tiring that you actually go out and collect EVIDENCE :D There has been a lot of discussion over the years about whether one should build symmetries into your architecture to get better data efficiency, or if it's better to just do data augmentation and learn the symmetries. In my own experiments (and in other papers that have looked at this), equivariance always outperformed data augmentation by a large margin (in problems with exact symmetries), and data augmentation never managed to accurately learn the symmetries. That is perhaps not surprising, given that in typical setups the number of epochs is limited and so each data point is only augmented a few times. Still, many "scale is all you need" folks believe that one should prefer data augmentation (or no bias at all) because eventually, with enough compute / data scale, the more general and scalable method will win (The Bitter Lesson). However, is data augmentation really more scalable? Scalability: how fast the method improves with data and compute scale, and for how long it keeps improving. This is exactly what equivariant nets are good at! We use transformers not N-grams for language, because they are more data efficient / scalable / better adapted to that problem domain. Paraphrasing Ilya Sutskever: scale is not all you need; it matters what you scale. In this latest work we decided to study the scaling behavior of equivariant networks empirically. As Johann explains in the thread below, we confirmed that equivariant networks are more data efficient. Interestingly, we were also able to confirm the intuition that in principle, the network should be able to learn the symmetry as well! When data augmentation is applied at sufficient scale, you get the same sample efficiency benefits as equivariance. HOWEVER: you need to do a huge number of epochs (which people don't do in practice), making equivariant networks more efficient / scalable in terms of training compute. So equivariant networks allow you to get the statistical benefits without paying the computational cost. The takeaway for me is that if you are working on a problem with exact symmetries, and are working on it because it is intrinsically important (climate, materials science / chemistry, molecular biology, etc.) rather than as a stepping stone to a more general problem (where the inductive bias could fail), then equivariant nets are still a good candidate in the age of scaling laws. Awesome work @johannbrehmer @pimdehaan Sönke Behrends!