Lukas Fisch

128 posts

Lukas Fisch

@codingfisch

Don't panic! Just build! https://t.co/FNuUXBB6HY

Tham gia Aralık 2013

101 Đang theo dõi200 Người theo dõi

Tweet ghim

Lukas Fisch@codingfisch·21 Ağu

🧠 Exciting news for #neuroscience! Launching deepmriprep: Voxel-based Morphometry (VBM) preprocessing via neural networks in ~10 seconds per #brain image 🚀 🔗 Preprint: arxiv.org/abs/2408.10656 🔗 GitHub: github.com/wwu-mmll/deepm… Install via "pip install deepmriprep"

English

283

26K

Lukas Fisch@codingfisch·2 Şub

@gadevenyi Oops, this one should work: github.com/wwu-mmll/deepm…

English

Gabriel A. Devenyi@gadevenyi·2 Şub

@codingfisch Dead Github Link.

English

Lukas Fisch@codingfisch·30 Oca

"deepmriprep: VBM preprocessing via deep neural networks" is published in Nature Computational Science 🧠💻 🔗 Paper: rdcu.be/e1t4N VBM preprocessing in ~10 seconds per #brain image 🚀 🔗 GitHub: github.com/wwu-mmll/deepm… Install via "pip install deepmriprep"

English

4.9K

Lukas Fisch đã retweet

TimHahn@TheRealTimHahn1·1 Şub

This really speeds up preprocessing and shows - yet again - that neural networks are eating software. Great work, @codingfisch !! Happy to see this finally published.

Nature Computational Science@NatComputSci

📢Out now! @codingfisch and colleagues present deepmriprep, a tool that leverages neural networks to enable 37x faster Voxel-based Morphometry preprocessing of MRI data than existing methods. nature.com/articles/s4358…

English

140

Lukas Fisch@codingfisch·30 Oca

Not convinced? How about this: Christian Gaser (author of CAT12) is building his new SBM toolbox around deepmriprep: github.com/ChristianGaser…

English

272

Lukas Fisch đã retweet

Nature Computational Science@NatComputSci·30 Oca

English

3.4K

Lukas Fisch@codingfisch·25 Kas

@__tinygrad__ @ID_AA_Carmack RISC for array ops sounds too elegant to be impossible (as George said @clattner_llvm told him). I hope you will find the right abstractions to crush this problem soon!

English

147

the tiny corp@__tinygrad__·25 Kas

@codingfisch @ID_AA_Carmack That's the goal! Our fusion for stuff like this is already really good, our matmuls and convs are not ops.

English

316

John Carmack@ID_AA_Carmack·25 Kas

Always a slightly mixed feeling to write pretty good first-principles code to do some tensor rearrangement, only to find that PyTorch has a built in function that does it faster. I had made a point of at least skimming the docs of every torch and tensor function, but if you don’t know what you would use something for, it probably won’t stick in your head for when the opportunity arises. Pixel_unshuffle, looking at you.

English

1.2K

171.8K

Lukas Fisch đã retweet

the tiny corp@__tinygrad__·20 Eki

Our GPU stack for both NVIDIA and AMD, aside from minimal pieces of signed firmware, is 100% open source and pure Python except for the compiler. It's not using vendor drivers, frameworks, or libraries. That's why it's so easy to make it work on Mac. For compilers, on AMD, we use upstream LLVM, and on NVIDIA, we use the NAK compiler from the MESA project. We plan to replace the compiler with pure tinygrad in a year or two as well. With RANGEIFY merged, our lowering stuff now matches the state of the art, TVM style. We're studying ThunderKittens and TileLang for speed at that level, and should have all this stuff ready in 200 days for the due date of our AMD Llama 405B training contract. Due to tinygrad's small size and pure Python nature, it's the easiest ML library to make progress on, aka fastest slope of improvement. With Megakernel style for scheduling, MODeL_opt style for planning, and E-graph style for symbolic, we should blow past the state of the art in PyTorch and JAX speed. If we do that, NVIDIA's moat is over. It's 1000 lines at most to add a new accelerator to tinygrad. And I don't mean to add a new accelerator with help from a kernel driver, compiler, and libraries. Just 1000 lines of software for the *whole* accelerator speaking right on the PCIe BARs, like what tinygrad is doing with the NVIDIA and AMD GPUs now.

English

838

60.6K

Lukas Fisch@codingfisch·19 Eki

This!

george hotz archive@geohotarchive

the solution is simple but you aren’t demoralized enough yet geohot.github.io//blog/jekyll/u…

English

Lukas Fisch@codingfisch·16 Eki

These are special times 📈

Keller Jordan@kellerjordan0

Theorem: The maximum possible duration of the computational singularity is 470 years. Proof: The FLOPs capacity of all computers which existed in the year 1986 is estimated to be at most 4.5e14 (Hilbert et al. 2011). Based on public Nvidia revenue and GPU specs, this capacity has grown to at least 1e22 FLOPs as of 2025. This difference implies an average growth rate of 55% per year since 1986. Now observe that the physical universe can support at most 10^104 FLOPs (Lloyd 2000). Therefore, even if we allow for the discovery of faster than light travel, the computational singularity — i.e., the historical period of elevated social and technological unpredictability driven by rapid growth in worldwide computational capacity — cannot persist for longer than (2025 -1986) + (104-22)/log_10(1.55) ~= 470 years. References: S. Lloyd, “Ultimate physical limits to computation,” *arXiv preprint quant-ph/9908043*, 1999, doi:10.48550/arXiv.quant-ph/9908043. M. Hilbert and P. López, “The world’s technological capacity to store, communicate, and compute information,” *Science*, vol. 332, no. 6025, pp. 60–65, Apr. 2011, doi:10.1126/science.1200970.

English

115

Lukas Fisch@codingfisch·10 Eki

@eigenron Even better because only rollout is even faster! You can use the rollout function (see flashrl/main.py).

English

eigenron@eigenron·10 Eki

@codingfisch i'm not training rn just collecting data. and i already did. but thanks, wiill check it out.

English

eigenron@eigenron·10 Eki

i know gpu parallelization is in the hype rn but i just parallelized my rollout collection for a world model implementation across 10 CPU cores on my new MacBook Air M4. holy shit the collection for 10,000 rollouts for the car-racing-v3 env went from ~5 mins to <1.25 mins. this is called embarrassingly parallel processing. zero gpu, full throttle cpu baby.

English

260

16.7K

Lukas Fisch@codingfisch·10 Eki

@eigenron Here is the repo: github.com/codingfisch/fl…

English

Lukas Fisch@codingfisch·10 Eki

@eigenron Try "pip install flashrl" to make it even faster (on CPU and GPU). Write 6 lines of code to trains Pong in a few seconds!

GIF

English

174

Lukas Fisch@codingfisch·1 Eki

Good summary but "frontier LLM researchers...shifted a little too much into exploit mode" is an understatement. A large chunk of ALL AI researchers bet on scaling up LLMs to AGI. If this bet fails we spent a lot of researcher FLOPs in a local optimum. New small-scale RL ideas are needed!

English

184

Andrej Karpathy@karpathy·1 Eki

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea is sufficiently "bitter lesson pilled" (meaning arranged so that it benefits from added computation for free) as a proxy for whether it's going to work or worth even pursuing. The underlying assumption being that LLMs are of course highly "bitter lesson pilled" indeed, just look at LLM scaling laws where if you put compute on the x-axis, number go up and to the right. So it's amusing to see that Sutton, the author of the post, is not so sure that LLMs are "bitter lesson pilled" at all. They are trained on giant datasets of fundamentally human data, which is both 1) human generated and 2) finite. What do you do when you run out? How do you prevent a human bias? So there you have it, bitter lesson pilled LLM researchers taken down by the author of the bitter lesson - rough! In some sense, Dwarkesh (who represents the LLM researchers viewpoint in the pod) and Sutton are slightly speaking past each other because Sutton has a very different architecture in mind and LLMs break a lot of its principles. He calls himself a "classicist" and evokes the original concept of Alan Turing of building a "child machine" - a system capable of learning through experience by dynamically interacting with the world. There's no giant pretraining stage of imitating internet webpages. There's also no supervised finetuning, which he points out is absent in the animal kingdom (it's a subtle point but Sutton is right in the strong sense: animals may of course observe demonstrations, but their actions are not directly forced/"teleoperated" by other animals). Another important note he makes is that even if you just treat pretraining as an initialization of a prior before you finetune with reinforcement learning, Sutton sees the approach as tainted with human bias and fundamentally off course, a bit like when AlphaZero (which has never seen human games of Go) beats AlphaGo (which initializes from them). In Sutton's world view, all there is is an interaction with a world via reinforcement learning, where the reward functions are partially environment specific, but also intrinsically motivated, e.g. "fun", "curiosity", and related to the quality of the prediction in your world model. And the agent is always learning at test time by default, it's not trained once and then deployed thereafter. Overall, Sutton is a lot more interested in what we have common with the animal kingdom instead of what differentiates us. "If we understood a squirrel, we'd be almost done". As for my take... First, I should say that I think Sutton was a great guest for the pod and I like that the AI field maintains entropy of thought and that not everyone is exploiting the next local iteration LLMs. AI has gone through too many discrete transitions of the dominant approach to lose that. And I also think that his criticism of LLMs as not bitter lesson pilled is not inadequate. Frontier LLMs are now highly complex artifacts with a lot of humanness involved at all the stages - the foundation (the pretraining data) is all human text, the finetuning data is human and curated, the reinforcement learning environment mixture is tuned by human engineers. We do not in fact have an actual, single, clean, actually bitter lesson pilled, "turn the crank" algorithm that you could unleash upon the world and see it learn automatically from experience alone. Does such an algorithm even exist? Finding it would of course be a huge AI breakthrough. Two "example proofs" are commonly offered to argue that such a thing is possible. The first example is the success of AlphaZero learning to play Go completely from scratch with no human supervision whatsoever. But the game of Go is clearly such a simple, closed, environment that it's difficult to see the analogous formulation in the messiness of reality. I love Go, but algorithmically and categorically, it is essentially a harder version of tic tac toe. The second example is that of animals, like squirrels. And here, personally, I am also quite hesitant whether it's appropriate because animals arise by a very different computational process and via different constraints than what we have practically available to us in the industry. Animal brains are nowhere near the blank slate they appear to be at birth. First, a lot of what is commonly attributed to "learning" is imo a lot more "maturation". And second, even that which clearly is "learning" and not maturation is a lot more "finetuning" on top of something clearly powerful and preexisting. Example. A baby zebra is born and within a few dozen minutes it can run around the savannah and follow its mother. This is a highly complex sensory-motor task and there is no way in my mind that this is achieved from scratch, tabula rasa. The brains of animals and the billions of parameters within have a powerful initialization encoded in the ATCGs of their DNA, trained via the "outer loop" optimization in the course of evolution. If the baby zebra spasmed its muscles around at random as a reinforcement learning policy would have you do at initialization, it wouldn't get very far at all. Similarly, our AIs now also have neural networks with billions of parameters. These parameters need their own rich, high information density supervision signal. We are not going to re-run evolution. But we do have mountains of internet documents. Yes it is basically supervised learning that is ~absent in the animal kingdom. But it is a way to practically gather enough soft constraints over billions of parameters, to try to get to a point where you're not starting from scratch. TLDR: Pretraining is our crappy evolution. It is one candidate solution to the cold start problem, to be followed later by finetuning on tasks that look more correct, e.g. within the reinforcement learning framework, as state of the art frontier LLM labs now do pervasively. I still think it is worth to be inspired by animals. I think there are multiple powerful ideas that LLM agents are algorithmically missing that can still be adapted from animal intelligence. And I still think the bitter lesson is correct, but I see it more as something platonic to pursue, not necessarily to reach, in our real world and practically speaking. And I say both of these with double digit percent uncertainty and cheer the work of those who disagree, especially those a lot more ambitious bitter lesson wise. So that brings us to where we are. Stated plainly, today's frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top. They are not platonically bitter lesson pilled, but they are perhaps "practically" bitter lesson pilled, at least compared to a lot of what came before. It seems possibly to me that over time, we can further finetune our ghosts more and more in the direction of animals; That it's not so much a fundamental incompatibility but a matter of initialization in the intelligence space. But it's also quite possible that they diverge even further and end up permanently different, un-animal-like, but still incredibly helpful and properly world-altering. It's possible that ghosts:animals :: planes:birds. Anyway, in summary, overall and actionably, I think this pod is solid "real talk" from Sutton to the frontier LLM researchers, who might be gear shifted a little too much in the exploit mode. Probably we are still not sufficiently bitter lesson pilled and there is a very good chance of more powerful ideas and paradigms, other than exhaustive benchbuilding and benchmaxxing. And animals might be a good source of inspiration. Intrinsic motivation, fun, curiosity, empowerment, multi-agent self-play, culture. Use your imagination.

Dwarkesh Patel@dwarkesh_sp

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI

English

417

1.2K

9.6K

Lukas Fisch@codingfisch·30 Eyl

@jsuarez @ID_AA_Carmack @clashluke So no adam vs muon? If not, why? Would be of interest (at least for @ID_AA_Carmack an me 😄)

English

Joseph Suarez 🐡@jsuarez·30 Eyl

@codingfisch @ID_AA_Carmack @clashluke The experiments were rigorous, their documentation was not. I am making up for that by doing quality large-scale ablations now. It's muon for most envs, adv filtering for some envs, minor bump from my new adv fn

English

114

Joseph Suarez 🐡@jsuarez·28 Eyl

What's the last research topic where you've been completely, unambiguously wrong? I'll go first: I thought optimizer research was just a nerd snipe for a decade. Then I integrated Muon into PufferLib and fully reswept hyperparams. Step change in capabilities, core default in 3.0.

English

262

29.6K

Lukas Fisch@codingfisch·30 Eyl

@jsuarez @ID_AA_Carmack @clashluke It would really help if you would quantify stuff like this rigorously. Muon vs Adam(W) across different envs (with hyperparameter sweeps if needed) with (~10) different random seeds. Would be interesting to see what changes in puffer bring the largest performance increase

English

106

Joseph Suarez 🐡@jsuarez·29 Eyl

@ID_AA_Carmack @clashluke I didn't quantity the shift tbh, just that when I fully reswept, we beat every benchmark internally. Are you getting anything out of the W part of AdamW? I haven't seen any benefits in high data regime

English

1.9K

Lukas Fisch@codingfisch·27 Eyl

@fchollet Google "Solomonoff induction" and "AIXI model" 😉 @mhutter42 formalized this idea very nicely. Unfortunately uncomputable 🙈

English

François Chollet@fchollet·27 Eyl

No theory feels true to me unless it is simple (relative to what it is explaining). The solution most likely to generalize is always the simplest one.

English

363

34.9K

Lukas Fisch@codingfisch·27 Eyl

@charliermarsh After "pip install viiew" it also supports scrolling through your array/tensor/dataframe

GIF

English

493

Charlie Marsh@charliermarsh·26 Eyl

Extremely nice qualify-of-life thing: Python 3.14 includes syntax highlighting in the REPL

English

49.2K

Lukas Fisch đã retweet

Thomas Kipf@tkipf·16 Eyl

The curse of dimensionality. The blessing of compositionality.

English

279

23.4K

Lukas Fisch đã retweet

François Chollet@fchollet·7 Eyl

When you store your knowledge and skills as parametric curves (as all deep learning models do), the only way you can generalize is via interpolation on the curve. The problem is that interpolated points *correlate* with the truth but have no *causal* link to the truth. Hence hallucinations. The fix is to start leveraging causal symbolic graphs as your representation substrate (e.g. computer programs of the kind we write as software engineers). The human-written software stack, with its extremely high degree of reliability despite its massive complexity, is proof of existence of exact truthiness propagation.

English

124

1.2K

114.2K

Khám phá

@gadevenyi @__tinygrad__ @ID_AA_Carmack @clattner_llvm @eigenron @jsuarez @clashluke @elonmusk