fabianbaumann

29 posts

fabianbaumann

@ftw_baumann

Katılım Ocak 2020

280 Takip Edilen12 Takipçiler

fabianbaumann retweetledi

Boris Cherny@bcherny·2 Oca

I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to use Claude Code: we intentionally build it in a way that you can use it, customize it, and hack it however you like. Each person on the Claude Code team uses it very differently. So, here goes.

English

1.3K

54.4K

8.1M

fabianbaumann retweetledi

Scott Page@Scott_E_Page·9 Eki

University of Michigan has an open assistant professor position in Complex Systems. Amazing opportunity to do join an interdisciplinary community of faculty and students. Looking for discipline thinkers who transcend disciplines. lsa.umich.edu/cscs/news-even…

English

163

25.9K

fabianbaumann retweetledi

Mohammad Atari@MohammadAtari90·9 Eki

🔔 New paper out in @PNASNews 🔔 “Large Language Models based on historical text could offer informative tools for behavioral science” W/ Michael Varnum, Nicolas Baumard, & @kurtjgray

English

307

39K

fabianbaumann retweetledi

Christoph Riedl@criedl·30 Eyl

Large study shows humans can learn from AI feedback but access to AI also amplifies existing inequalities by increasing the skill gap and reduces intellectual diversity: everyone learns to specialize in the same areas arxiv.org/abs/2409.18660

English

9.3K

fabianbaumann retweetledi

Andrej Karpathy@karpathy·25 Ağu

Haha we've all been there. I stumbled by this tweet earlier today and tried to write a little utility that auto-generates git commit message based on the git diff of staged changes. Gist: gist.github.com/karpathy/1dd02… So just typing `gcm` (short for git commit -m) auto-generates a one-line commit message, lets you to accept, edit, regenerate or cancel. Might be fun to experiment with. Uses the excellent `llm` CLI util from @simonw llm.datasette.io/en/stable/

English

187

330

4.8K

594.5K

fabianbaumann retweetledi

Ricard Solé@ricard_sole·23 Tem

Our project for experimentally testing planetary regulation (Lovelock & Margulis) on the test tube, using a synthetic microbial Gaian system (+1 PhD), has been funded by @AgEInves. A great adventure that started at @sfiscience 2 yrs ago with @VictorVmaull & @jordiplam

English

183

8.6K

fabianbaumann retweetledi

erwan plantec@eplantec·4 Tem

Super excited to share our new work (and the first of my PhD) : "Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning" We propose Lifelong Neural Developmental Programs for continually self-organizing artificial neural networks !

English

481

55.6K

fabianbaumann retweetledi

Alain Barrat@alainbarrat·3 Haz

Deadline shifted to June 15!

Alain Barrat@alainbarrat

Postdoc position in Marseilles in an interdisciplinary project with @BrovelliAndrea cpt.univ-mrs.fr/~barrat/postdo… Deadline for applications: June 5, 2024

English

fabianbaumann retweetledi

Thomas F. Varley@ThosVarley·1 Haz

I've gotten quite a few cold emails from students who found this long review I wrote useful and I'd like to do...something with it. It's way too long to submit as a stand-alone paper to, say, Entrop tho. Does anyone publish big tutorial reviews like this? arxiv.org/abs/2304.12482

English

423

59.5K

fabianbaumann retweetledi

Tobias Gerstenberg@tobigerstenberg·21 May

Now out in Trends in Cognitive Sciences!

Tobias Gerstenberg@tobigerstenberg

🚨 New preprint 🚨 In "Beyond the here and now: Counterfactual simulation in causal cognition", I discuss what role counterfactual simulation plays for how people judge causation and assign responsibility. 📰 osf.io/preprints/psya…

English

274

41.5K

fabianbaumann retweetledi

Andrej Karpathy@karpathy·12 May

Anyone else find themselves estimating the "GPT grade" of things you hear/read? When something is poorly written or generic, it's "GPT-2 grade" content. When something is lit, you can complement it as being "GPT-7 grade" etc. This reminds me of a fun side project I had saved for myself but will realistically never get around to, maybe someone can take a shot. Simply - train a classifier that predicts GPT-grade of any text. The training data would be samples from models of increasing strength. It might be that GPT models are too coarse and that too much changed between each one. Ideally you'd want a nice miniseries where everything is held constant except the model size, e.g. Llama 3 series, esp when they also release the smaller (and bigger!) models. Sample from the models over many prompts (or use base models?), classify the model size, then point it at various text on the internet, e.g. study the divergence between the comments section of WSJ and VC thought leadership :p. To be clear I have no idea if this would work, e.g. the classifier might very well latch on to the style a lot more than the content. Or it might measure not exactly an "intelligence" of text, but more just a "generic-ness", a proxy for frequency or so. It might also be an interesting way to study what is learned as you increase model size. But that's why it's an interesting project - it feels like it might kind of work, but it's not obvious and a number of details are tbd. Eye candy: ChatGPT attempts to visualize the above

English

1.2K

250.2K

fabianbaumann retweetledi

Ryan Liu@theryanliu·2 May

Honesty and helpfulness are two central goals of LLMs. But what happens when they are in conflict with one another? 😳 We investigate trade-offs LLMs make, which values they prioritize, and how RLHF and Chain-of-Thought influence these trade-offs: arxiv.org/abs/2402.07282 [1/3]

English

11.1K

fabianbaumann retweetledi

Jurg Spaak@JurgSpaak·3 May

In our new paper in @Ecology_Letters we apply modern coexistence theory to higher order species interactions where we compute niche and fitness differences as changin over time Thanks to Agnieszka Majer, Anna Skoracha (not on twitter) and @L_Kuczynski doi.org/10.1111/ele.14… 1/7

English

2.5K

fabianbaumann retweetledi

Yphtach Lelkes@ylelkes·1 May

the divide on perceived ideology of Target but not Walmart is pretty striking

Polarization Research Lab@PRL_Tweets

Our latest Path to 2024 report examines attitudes toward corporate political activism. Only 27.8% of Americans support corporations taking stances on social issues, with more Dems (39%) expressing support than Reps (23.3%). Read the full report: prlpublic.s3.amazonaws.com/reports/May202…

English

9.6K

fabianbaumann retweetledi

Ricard Solé@ricard_sole·23 Nis

How can we model the collective behavior of complex systems with many dimensions? Is it possible 2 find a model reduction that captures the key components? Check this great paper in @PhysRevLett & how to use it in many different contexts (networks+Ising) physics.byu.edu/faculty/transt…

English

143

11.7K

fabianbaumann retweetledi

Andrea Matranga 🇺🇦🌻@andreamatranga·22 Nis

And just like that... :) academic.oup.com/qje/advance-ar…

English

102

875

122.8K

fabianbaumann retweetledi

Iain Couzin@icouzin·17 Nis

It’s amazing how many empirical features of interactions and collective response can emerge from a simple imperative: to minimize surprise. Out today in PNAS @PNASNews - with @conorheins @richardpmann Karl Friston and colleagues! @CBehav @maxplanckpress pnas.org/doi/10.1073/pn…

English

110

404

68.4K

fabianbaumann retweetledi

Andrej Karpathy@karpathy·14 Nis

Highly amusing update, ~18 hours later: llm.c is now down to 26.2ms/iteration, exactly matching PyTorch (tf32 forward pass). We discovered a bug where we incorrectly called cuBLAS in fp32 mathmode 🤦‍♂️. And ademeure contributed a more optimized softmax kernel for very long rows (50,257 elements per row, in the last logits layer). But the fun doesn’t stop because we still have a lot of tricks up the sleeve. Our attention kernel is naive attention, not flash attention, and materializes the (very large) preattention and postattention matrices of sizes (B, NH, T, T), also it makes unnecessary round-trips with yet-unfused GeLU non-linearities and permute/unpermute inside our attention. And we haven’t reached for more optimizations, e.g. CUDA Graphs, lossless compressible memory (?), etc. So the updated chart looks bullish :D, and training LLMs faster than PyTorch with only ~2,000 lines of C code feels within reach. Backward pass let’s go.

Andrej Karpathy@karpathy

A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch (fp32, forward pass) compared to 4 days ago, when it was at 4.2X slower 📈 The biggest improvements were: - turn on TF32 (NVIDIA TensorFLoat-32) instead of FP32 for matmuls. This is a new mathmode in GPUs starting with Ampere+. This is a very nice, ~free optimization that sacrifices a little bit of precision for a large increase in performance, by running the matmuls on tensor cores, while chopping off the mantissa to only 10 bits (the least significant 19 bits of the float get lost). So the inputs, outputs and internal accumulates remain in fp32, but the multiplies are lower precision. Equivalent to PyTorch `torch.set_float32_matmul_precision('high')` - call cuBLASLt API instead of cuBLAS for the sGEMM (fp32 matrix multiply), as this allows you to also fuse the bias into the matmul and deletes the need for a separate add_bias kernel, which caused a silly round trip to global memory for one addition. - a more efficient attention kernel that uses 1) cooperative_groups reductions that look much cleaner and I only just learned about (they are not covered by the CUDA PMP book...), 2) the online softmax algorithm used in flash attention, 3) fused attention scaling factor multiply, 4) "built in" autoregressive mask bounds. (big thanks to ademeure, ngc92, lancerts on GitHub for writing / helping with these kernels!) Finally, ChatGPT created this amazing chart to illustrate our progress. 4 days ago we were 4.6X slower, today we are 2X slower. So we are going to beat PyTorch imminently 😂 Now (personally) going to focus on the backward pass, so we have the full training loop in CUDA.

English

156

531

1.1M

fabianbaumann retweetledi

Andrej Karpathy@karpathy·14 Nis

THE REVENGE OF PYTORCH just kidding :) @cHHillee (from PyTorch team) was kindly able to help improve the PyTorch baseline, done by 1) upgrading to nightly, 2) using the "compound" F.sdpa (scaled dot product attention) layer directly, and turning on a torch compile flag: TORCHINDUCTOR_COORDINATE_DESCENT_TUNING=1 The numbers are a bit different because this is a bit different GPU (A100 80GB, with higher memory bandwidth) but: llm.c: 23.026892 PyTorch 2.2: 22.408ms PyTorch nightly: 21.090ms PyTorch nightly + F.sdpa: 19.224ms PyTorch nightly + F.sdpa + coordinate descent tuning torch inductor flag: 18.809ms so ~20% speedup, see the fork for more details: #some-benchmark-numbers-with-newer-version-of-pytorch" target="_blank" rel="nofollow noopener">github.com/Chillee/llm.c?… another nice attached pointer is that torch compile can also generate and emit C++ code: github.com/Chillee/llm.c/…

English

1.2K

296K

fabianbaumann retweetledi

Andrej Karpathy@karpathy·8 Nis

Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: github.com/karpathy/llm.c To start, implements GPT-2 training on CPU/fp32 in only ~1,000 lines of clean code. It compiles and runs instantly, and exactly matches the PyTorch reference implementation. I chose GPT-2 to start because it is the grand-daddy of LLMs, the first time the LLM stack was put together in a recognizably modern form, and with model weights available.

English

285

1.8K

12.5K

1.7M

Keşfet

@PNASNews @kurtjgray @simonw @AgEInves @sfiscience @VictorVmaull @jordiplam @Ecology_Letters