Anders Larsen

55 posts

Anders Larsen

Anders Larsen

@AndersSL

Science and machine learning

Copenhagen Beigetreten Ekim 2008
1.2K Folgt151 Follower
Anders Larsen retweetet
Anders Larsen retweetet
PyTorch
PyTorch@PyTorch·
FlexAttention now has a FlashAttention-4 backend. FlexAttention has enabled researchers to rapidly prototype custom attention variants—with 1000+ repos adopting it and dozens of papers citing it. But users consistently hit a performance ceiling. Until now. We've added a FlashAttention-4 backend to FlexAttention on Hopper and Blackwell GPUs. PyTorch now auto-generates CuTeDSL score/mask modifications and JIT-instantiates FlashAttention-4 for your custom attention variant. The result: 1.2× to 3.2× speedups over Triton on compute-bound workloads. 🖇️ Read our latest blog here: hubs.la/Q045FHPh0 No more choosing between flexibility and performance. hashtag#PyTorch hashtag#FlexAttention hashtag#FlashAttention hashtag#OpenSourceAI
PyTorch tweet media
English
12
97
732
100.6K
Anders Larsen retweetet
Marwin Segler
Marwin Segler@marwinsegler·
Scalable emulation of protein equilibrium ensembles with generative deep learning => now out in Science Magazine! science.org/doi/10.1126/sc…
English
1
9
50
3.5K
Anders Larsen retweetet
Rianne van den Berg
Rianne van den Berg@vdbergrianne·
🚀 After two+ years of intense research, we’re thrilled to introduce Skala — a scalable deep learning density functional that hits chemical accuracy on atomization energies and matches hybrid-level accuracy on main group chemistry — all at the cost of semi-local DFT. ⚛️🔥🧪🧬
Rianne van den Berg tweet media
English
5
61
291
33.1K
Anders Larsen retweetet
D. E. Shaw Research
D. E. Shaw Research@DEShawResearch·
Join us to work on LLMs for drug discovery, including scaling/optimizing large model training and inference workflows on our cutting-edge infrastructure, pre-training, post-training, and multimodal learning and integrating non-text modalities. apply.deshawresearch.com/careers/Regist…
English
0
9
36
3.4K
Anders Larsen retweetet
Seunghyun Seo
Seunghyun Seo@SeunghyunSEO7·
btw, i wrote a post about "how to scale" based on what i've learned over the past few months. it covers muP, HP scaling laws, and some stuffs. would be happy to get any feedback or discussion. (it's pretty verbose and no TL;DR, sorry lol) howtoscalenn.github.io
Seunghyun Seo tweet media
English
13
79
711
63.9K
Anders Larsen retweetet
Zhuang Liu
Zhuang Liu@liuzhuang1234·
New paper - Transformers, but without normalization layers (1/n)
Zhuang Liu tweet media
English
76
576
4.1K
1.3M
Anders Larsen retweetet
Max Aifer
Max Aifer@MaxAifer·
A thread on our new paper Thermodynamic Bayesian Inference 250 years later, Bayes’s theorem is still the gold standard for probabilistic reasoning. But for complicated models it’s too hard to implement exactly, so approximations are used. For example, the complexity of Bayesian Neural Network posteriors makes them hard to sample from (see cims.nyu.edu/~andrewgw/bnnh…).
Max Aifer tweet media
English
10
118
947
173.8K
Anders Larsen
Anders Larsen@AndersSL·
@cloneofsimo Can you use the same learning rate and weight decay as with Adam or do you have to do a new search?
English
0
0
0
684
Simo Ryu
Simo Ryu@cloneofsimo·
Shampoo Scaling law for language model Plot taste of Kaplan et al, but comparing shampoo and adam. Shampoo is literally such a free lunch, in large scale, in predictable manner.
Simo Ryu tweet media
English
28
43
475
166.2K
Anders Larsen retweetet
Horace He
Horace He@cHHillee·
For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10
Horace He tweet media
English
25
270
1.5K
287.6K
Anders Larsen
Anders Larsen@AndersSL·
Our neural network, PhAI, accurately predicts phases from amplitudes without assumptions about crystal contents. No prior knowledge needed and PhAI does very well with low-resolution data (e.g., 2.0 Å)
English
0
0
2
253
Anders Larsen
Anders Larsen@AndersSL·
Our research reveals neural networks could improve structure solutions for weakly scattering crystals! This includes protein crystals, metal-organic frameworks, and nanometer-sized crystals often seen in electron diffraction.
Anders Larsen tweet media
English
1
0
2
306
Anders Larsen retweetet
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
This is exactly what I hate with all big frameworks. TF is terrible. PyTorch used to be straightforward but turned terrible too. Torch7 was very direct. JAX/Flax still ok, but I pray every day that it doesn’t end up with the same fate over time.
Lucas Beyer (bl16) tweet media
Andrej Karpathy@karpathy

Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: github.com/karpathy/llm.c To start, implements GPT-2 training on CPU/fp32 in only ~1,000 lines of clean code. It compiles and runs instantly, and exactly matches the PyTorch reference implementation. I chose GPT-2 to start because it is the grand-daddy of LLMs, the first time the LLM stack was put together in a recognizably modern form, and with model weights available.

English
26
42
674
146.5K