fabianbaumann

29 posts

fabianbaumann

fabianbaumann

@ftw_baumann

Katılım Ocak 2020
280 Takip Edilen12 Takipçiler
fabianbaumann retweetledi
Boris Cherny
Boris Cherny@bcherny·
I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to use Claude Code: we intentionally build it in a way that you can use it, customize it, and hack it however you like. Each person on the Claude Code team uses it very differently. So, here goes.
English
1.3K
7K
54.4K
8.1M
fabianbaumann retweetledi
Scott Page
Scott Page@Scott_E_Page·
University of Michigan has an open assistant professor position in Complex Systems. Amazing opportunity to do join an interdisciplinary community of faculty and students. Looking for discipline thinkers who transcend disciplines. lsa.umich.edu/cscs/news-even…
English
3
63
163
25.9K
fabianbaumann retweetledi
Mohammad Atari
Mohammad Atari@MohammadAtari90·
🔔 New paper out in @PNASNews 🔔 “Large Language Models based on historical text could offer informative tools for behavioral science” W/ Michael Varnum, Nicolas Baumard, & @kurtjgray
Mohammad Atari tweet media
English
5
83
307
39K
fabianbaumann retweetledi
Christoph Riedl
Christoph Riedl@criedl·
Large study shows humans can learn from AI feedback but access to AI also amplifies existing inequalities by increasing the skill gap and reduces intellectual diversity: everyone learns to specialize in the same areas arxiv.org/abs/2409.18660
Christoph Riedl tweet mediaChristoph Riedl tweet mediaChristoph Riedl tweet media
English
1
24
71
9.3K
fabianbaumann retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Haha we've all been there. I stumbled by this tweet earlier today and tried to write a little utility that auto-generates git commit message based on the git diff of staged changes. Gist: gist.github.com/karpathy/1dd02… So just typing `gcm` (short for git commit -m) auto-generates a one-line commit message, lets you to accept, edit, regenerate or cancel. Might be fun to experiment with. Uses the excellent `llm` CLI util from @simonw llm.datasette.io/en/stable/
English
187
330
4.8K
594.5K
fabianbaumann retweetledi
Ricard Solé
Ricard Solé@ricard_sole·
Our project for experimentally testing planetary regulation (Lovelock & Margulis) on the test tube, using a synthetic microbial Gaian system (+1 PhD), has been funded by @AgEInves. A great adventure that started at @sfiscience 2 yrs ago with @VictorVmaull & @jordiplam
Ricard Solé tweet media
English
4
38
183
8.6K
fabianbaumann retweetledi
erwan plantec
erwan plantec@eplantec·
Super excited to share our new work (and the first of my PhD) : "Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning" We propose Lifelong Neural Developmental Programs for continually self-organizing artificial neural networks !
English
9
65
481
55.6K
fabianbaumann retweetledi
Thomas F. Varley
Thomas F. Varley@ThosVarley·
I've gotten quite a few cold emails from students who found this long review I wrote useful and I'd like to do...something with it. It's way too long to submit as a stand-alone paper to, say, Entrop tho. Does anyone publish big tutorial reviews like this? arxiv.org/abs/2304.12482
English
12
77
423
59.5K
fabianbaumann retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Anyone else find themselves estimating the "GPT grade" of things you hear/read? When something is poorly written or generic, it's "GPT-2 grade" content. When something is lit, you can complement it as being "GPT-7 grade" etc. This reminds me of a fun side project I had saved for myself but will realistically never get around to, maybe someone can take a shot. Simply - train a classifier that predicts GPT-grade of any text. The training data would be samples from models of increasing strength. It might be that GPT models are too coarse and that too much changed between each one. Ideally you'd want a nice miniseries where everything is held constant except the model size, e.g. Llama 3 series, esp when they also release the smaller (and bigger!) models. Sample from the models over many prompts (or use base models?), classify the model size, then point it at various text on the internet, e.g. study the divergence between the comments section of WSJ and VC thought leadership :p. To be clear I have no idea if this would work, e.g. the classifier might very well latch on to the style a lot more than the content. Or it might measure not exactly an "intelligence" of text, but more just a "generic-ness", a proxy for frequency or so. It might also be an interesting way to study what is learned as you increase model size. But that's why it's an interesting project - it feels like it might kind of work, but it's not obvious and a number of details are tbd. Eye candy: ChatGPT attempts to visualize the above
Andrej Karpathy tweet media
English
67
73
1.2K
250.2K
fabianbaumann retweetledi
Ryan Liu
Ryan Liu@theryanliu·
Honesty and helpfulness are two central goals of LLMs. But what happens when they are in conflict with one another? 😳 We investigate trade-offs LLMs make, which values they prioritize, and how RLHF and Chain-of-Thought influence these trade-offs: arxiv.org/abs/2402.07282 [1/3]
English
1
12
60
11.1K
fabianbaumann retweetledi
Jurg Spaak
Jurg Spaak@JurgSpaak·
In our new paper in @Ecology_Letters we apply modern coexistence theory to higher order species interactions where we compute niche and fitness differences as changin over time Thanks to Agnieszka Majer, Anna Skoracha (not on twitter) and @L_Kuczynski doi.org/10.1111/ele.14… 1/7
English
1
9
26
2.5K
fabianbaumann retweetledi
Ricard Solé
Ricard Solé@ricard_sole·
How can we model the collective behavior of complex systems with many dimensions? Is it possible 2 find a model reduction that captures the key components? Check this great paper in @PhysRevLett & how to use it in many different contexts (networks+Ising) physics.byu.edu/faculty/transt…
Ricard Solé tweet media
English
1
33
143
11.7K
fabianbaumann retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Highly amusing update, ~18 hours later: llm.c is now down to 26.2ms/iteration, exactly matching PyTorch (tf32 forward pass). We discovered a bug where we incorrectly called cuBLAS in fp32 mathmode 🤦‍♂️. And ademeure contributed a more optimized softmax kernel for very long rows (50,257 elements per row, in the last logits layer). But the fun doesn’t stop because we still have a lot of tricks up the sleeve. Our attention kernel is naive attention, not flash attention, and materializes the (very large) preattention and postattention matrices of sizes (B, NH, T, T), also it makes unnecessary round-trips with yet-unfused GeLU non-linearities and permute/unpermute inside our attention. And we haven’t reached for more optimizations, e.g. CUDA Graphs, lossless compressible memory (?), etc. So the updated chart looks bullish :D, and training LLMs faster than PyTorch with only ~2,000 lines of C code feels within reach. Backward pass let’s go.
Andrej Karpathy tweet media
Andrej Karpathy@karpathy

A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch (fp32, forward pass) compared to 4 days ago, when it was at 4.2X slower 📈 The biggest improvements were: - turn on TF32 (NVIDIA TensorFLoat-32) instead of FP32 for matmuls. This is a new mathmode in GPUs starting with Ampere+. This is a very nice, ~free optimization that sacrifices a little bit of precision for a large increase in performance, by running the matmuls on tensor cores, while chopping off the mantissa to only 10 bits (the least significant 19 bits of the float get lost). So the inputs, outputs and internal accumulates remain in fp32, but the multiplies are lower precision. Equivalent to PyTorch `torch.set_float32_matmul_precision('high')` - call cuBLASLt API instead of cuBLAS for the sGEMM (fp32 matrix multiply), as this allows you to also fuse the bias into the matmul and deletes the need for a separate add_bias kernel, which caused a silly round trip to global memory for one addition. - a more efficient attention kernel that uses 1) cooperative_groups reductions that look much cleaner and I only just learned about (they are not covered by the CUDA PMP book...), 2) the online softmax algorithm used in flash attention, 3) fused attention scaling factor multiply, 4) "built in" autoregressive mask bounds. (big thanks to ademeure, ngc92, lancerts on GitHub for writing / helping with these kernels!) Finally, ChatGPT created this amazing chart to illustrate our progress. 4 days ago we were 4.6X slower, today we are 2X slower. So we are going to beat PyTorch imminently 😂 Now (personally) going to focus on the backward pass, so we have the full training loop in CUDA.

English
156
531
6K
1.1M
fabianbaumann retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
THE REVENGE OF PYTORCH just kidding :) @cHHillee (from PyTorch team) was kindly able to help improve the PyTorch baseline, done by 1) upgrading to nightly, 2) using the "compound" F.sdpa (scaled dot product attention) layer directly, and turning on a torch compile flag: TORCHINDUCTOR_COORDINATE_DESCENT_TUNING=1 The numbers are a bit different because this is a bit different GPU (A100 80GB, with higher memory bandwidth) but: llm.c: 23.026892 PyTorch 2.2: 22.408ms PyTorch nightly: 21.090ms PyTorch nightly + F.sdpa: 19.224ms PyTorch nightly + F.sdpa + coordinate descent tuning torch inductor flag: 18.809ms so ~20% speedup, see the fork for more details: #some-benchmark-numbers-with-newer-version-of-pytorch" target="_blank" rel="nofollow noopener">github.com/Chillee/llm.c?… another nice attached pointer is that torch compile can also generate and emit C++ code: github.com/Chillee/llm.c/…
English
28
46
1.2K
296K
fabianbaumann retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: github.com/karpathy/llm.c To start, implements GPT-2 training on CPU/fp32 in only ~1,000 lines of clean code. It compiles and runs instantly, and exactly matches the PyTorch reference implementation. I chose GPT-2 to start because it is the grand-daddy of LLMs, the first time the LLM stack was put together in a recognizably modern form, and with model weights available.
English
285
1.8K
12.5K
1.7M