Seung Joon Choi

8.8K posts

Seung Joon Choi banner
Seung Joon Choi

Seung Joon Choi

@erucipe

Korea เข้าร่วม Temmuz 2009
1.8K กำลังติดตาม1.4K ผู้ติดตาม
Seung Joon Choi รีทวีตแล้ว
Andrej Karpathy
Andrej Karpathy@karpathy·
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea is sufficiently "bitter lesson pilled" (meaning arranged so that it benefits from added computation for free) as a proxy for whether it's going to work or worth even pursuing. The underlying assumption being that LLMs are of course highly "bitter lesson pilled" indeed, just look at LLM scaling laws where if you put compute on the x-axis, number go up and to the right. So it's amusing to see that Sutton, the author of the post, is not so sure that LLMs are "bitter lesson pilled" at all. They are trained on giant datasets of fundamentally human data, which is both 1) human generated and 2) finite. What do you do when you run out? How do you prevent a human bias? So there you have it, bitter lesson pilled LLM researchers taken down by the author of the bitter lesson - rough! In some sense, Dwarkesh (who represents the LLM researchers viewpoint in the pod) and Sutton are slightly speaking past each other because Sutton has a very different architecture in mind and LLMs break a lot of its principles. He calls himself a "classicist" and evokes the original concept of Alan Turing of building a "child machine" - a system capable of learning through experience by dynamically interacting with the world. There's no giant pretraining stage of imitating internet webpages. There's also no supervised finetuning, which he points out is absent in the animal kingdom (it's a subtle point but Sutton is right in the strong sense: animals may of course observe demonstrations, but their actions are not directly forced/"teleoperated" by other animals). Another important note he makes is that even if you just treat pretraining as an initialization of a prior before you finetune with reinforcement learning, Sutton sees the approach as tainted with human bias and fundamentally off course, a bit like when AlphaZero (which has never seen human games of Go) beats AlphaGo (which initializes from them). In Sutton's world view, all there is is an interaction with a world via reinforcement learning, where the reward functions are partially environment specific, but also intrinsically motivated, e.g. "fun", "curiosity", and related to the quality of the prediction in your world model. And the agent is always learning at test time by default, it's not trained once and then deployed thereafter. Overall, Sutton is a lot more interested in what we have common with the animal kingdom instead of what differentiates us. "If we understood a squirrel, we'd be almost done". As for my take... First, I should say that I think Sutton was a great guest for the pod and I like that the AI field maintains entropy of thought and that not everyone is exploiting the next local iteration LLMs. AI has gone through too many discrete transitions of the dominant approach to lose that. And I also think that his criticism of LLMs as not bitter lesson pilled is not inadequate. Frontier LLMs are now highly complex artifacts with a lot of humanness involved at all the stages - the foundation (the pretraining data) is all human text, the finetuning data is human and curated, the reinforcement learning environment mixture is tuned by human engineers. We do not in fact have an actual, single, clean, actually bitter lesson pilled, "turn the crank" algorithm that you could unleash upon the world and see it learn automatically from experience alone. Does such an algorithm even exist? Finding it would of course be a huge AI breakthrough. Two "example proofs" are commonly offered to argue that such a thing is possible. The first example is the success of AlphaZero learning to play Go completely from scratch with no human supervision whatsoever. But the game of Go is clearly such a simple, closed, environment that it's difficult to see the analogous formulation in the messiness of reality. I love Go, but algorithmically and categorically, it is essentially a harder version of tic tac toe. The second example is that of animals, like squirrels. And here, personally, I am also quite hesitant whether it's appropriate because animals arise by a very different computational process and via different constraints than what we have practically available to us in the industry. Animal brains are nowhere near the blank slate they appear to be at birth. First, a lot of what is commonly attributed to "learning" is imo a lot more "maturation". And second, even that which clearly is "learning" and not maturation is a lot more "finetuning" on top of something clearly powerful and preexisting. Example. A baby zebra is born and within a few dozen minutes it can run around the savannah and follow its mother. This is a highly complex sensory-motor task and there is no way in my mind that this is achieved from scratch, tabula rasa. The brains of animals and the billions of parameters within have a powerful initialization encoded in the ATCGs of their DNA, trained via the "outer loop" optimization in the course of evolution. If the baby zebra spasmed its muscles around at random as a reinforcement learning policy would have you do at initialization, it wouldn't get very far at all. Similarly, our AIs now also have neural networks with billions of parameters. These parameters need their own rich, high information density supervision signal. We are not going to re-run evolution. But we do have mountains of internet documents. Yes it is basically supervised learning that is ~absent in the animal kingdom. But it is a way to practically gather enough soft constraints over billions of parameters, to try to get to a point where you're not starting from scratch. TLDR: Pretraining is our crappy evolution. It is one candidate solution to the cold start problem, to be followed later by finetuning on tasks that look more correct, e.g. within the reinforcement learning framework, as state of the art frontier LLM labs now do pervasively. I still think it is worth to be inspired by animals. I think there are multiple powerful ideas that LLM agents are algorithmically missing that can still be adapted from animal intelligence. And I still think the bitter lesson is correct, but I see it more as something platonic to pursue, not necessarily to reach, in our real world and practically speaking. And I say both of these with double digit percent uncertainty and cheer the work of those who disagree, especially those a lot more ambitious bitter lesson wise. So that brings us to where we are. Stated plainly, today's frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top. They are not platonically bitter lesson pilled, but they are perhaps "practically" bitter lesson pilled, at least compared to a lot of what came before. It seems possibly to me that over time, we can further finetune our ghosts more and more in the direction of animals; That it's not so much a fundamental incompatibility but a matter of initialization in the intelligence space. But it's also quite possible that they diverge even further and end up permanently different, un-animal-like, but still incredibly helpful and properly world-altering. It's possible that ghosts:animals :: planes:birds. Anyway, in summary, overall and actionably, I think this pod is solid "real talk" from Sutton to the frontier LLM researchers, who might be gear shifted a little too much in the exploit mode. Probably we are still not sufficiently bitter lesson pilled and there is a very good chance of more powerful ideas and paradigms, other than exhaustive benchbuilding and benchmaxxing. And animals might be a good source of inspiration. Intrinsic motivation, fun, curiosity, empowerment, multi-agent self-play, culture. Use your imagination.
Dwarkesh Patel@dwarkesh_sp

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI

English
417
1.2K
9.6K
2M
Alex Mordvintsev
Alex Mordvintsev@zzznah·
Ten years ago I woke up in the middle of the night from an uncanny dream. Unable to sleep, I decided to try an experiment that had been on my mind for days. In half an hour, the strange phenomenon that would flood the internet that summer was born. Happy DeepDream day! youtu.be/YqhdzaclxKo
YouTube video
YouTube
Alex Mordvintsev tweet media
Kate Vass Studio@KateVassGalerie

COLLECTOR’S CHOICE 1/ This month celebrates the 10th anniversary of DeepDream, an important development in the history of AI-generated art. Introduced in May 2015 by Alexander Mordvintsev @zzznah, a researcher and artist based in Zurich, DeepDream was one of the first widely recognized applications of neural networks for image generation. It played a major role in popularizing AI art, inspiring a wave of experimentation that continues among many artists today. Image: Just before DeepDream: 1000 classes #3, 2015/01 by Alexander Mordvintsev

English
50
161
1.4K
166.7K
Seung Joon Choi
Seung Joon Choi@erucipe·
@dwarkesh_sp @stripepress I noticed a small error in your book (Kindle). "People have been testing whether these models are overfit to the benchmarks. Scale AI recently did this with GSM8K. [See Figure 8.]" I believe it should refer to Figure 7, not Figure 8. Anyway, it’s a great read. thank you!
English
1
0
1
96
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
I'm so pleased to present a new book with @stripepress: "The Scaling Era: An Oral History of AI, 2019-2025." Over the last few years, I interviewed the key people thinking about AI: scientists, CEOs, economists, philosophers. This book curates and organizes the highlights across all these conversations. You get to see thinkers across many, many fields address the same gnarly questions: “What is the true nature of intelligence? What will change from the millions of machine intelligences running around? What exactly will it take to get there?” Settled answers are unavailable; we’re all running unsupervised. But between these discussions lie, I hope, some insights on the most interesting and important questions of our era. Link below. Enjoy!
Dwarkesh Patel tweet media
English
152
278
3.3K
539.1K
Andrej Karpathy
Andrej Karpathy@karpathy·
Seeding my Bear ʕ•ᴥ•ʔ blog with more random posts, e.g. here's something I had on backlog for a while: # The append-and-review note An approach to note taking that I stumbled on and has worked for me quite well for many years. I find that it strikes a good balance of being super simple and easy to use but it also captures the majority of day-to-day note taking use cases. Data structure. I maintain one single text note in the Apple Notes app just called "notes". Maintaining more than one note and managing and sorting them into folders and recursive substructures costs way too much cognitive bloat. A single note means CTRL+F is simple and trivial. Apple does a good job of optional offline editing, syncing between devices, and backup. Append. Any time any idea or any todo or anything else comes to mind, I append it to the note on top, simply as text. Either when I'm on my computer when working, or my iPhone when on the go. I don't find that tagging these notes with any other structured metadata (dates, links, concepts, tags) is that useful and I don't do it by default. The only exception is that I use tags like "watch:", "listen:", or "read:", so they are easy to CTRL+F for when I'm looking for something to watch late at night, listen to during a run/walk, or read during a flight, etc. Review. As things get added to the top, everything else starts to sink towards the bottom, almost as if under gravity. Every now and then, I fish through the notes by scrolling downwards and skimming. If I find anything that deserves to not leave my attention, I rescue it towards the top by simply copy pasting. Sometimes I merge, process, group or modify notes when they seem related. I delete a note only rarely. Notes that repeatedly don't deserve attention will naturally continue to sink. They are never lost, they just don't deserve the top of mind. Example usage: - Totally random idea springs to mind but I'm on the go and can't think about it, so I add it to the note, to get back around to later. - Someone at a party mentions a movie I should watch. - I see a glowing review of a book while doom scrolling through X. - I sit down in the morning and write a small TODO list for what I'd like to achieve that day. - I just need some writing surface for something I'm thinking about. - I was going to post a tweet but I think it needs a bit more thought. Copy paste into notes to think through a bit more later. - I find an interesting quote and I want to be reminded of it now and then. - My future self should really think about this thing more. - I'm reading a paper and I want to note some interesting numbers down. - I'm working on something random and I just need a temporary surface to CTRL+C and CTRL+V a few things around. - I keep forgetting that shell command that lists all Python files recursively so now I keep it in the note. - I'm running a hyperparameter sweep of my neural network and I record the commands I ran and the eventual outcome of the experiment. - I feel stressed that there are too many things on my mind and I worry that I'll lose them, so I just sit down and quickly dump them into a bullet point list. - I realize while I'm re-ordering some of my notes that I've actually thought about the same thing a lot but from different perspectives. I process it a bit more, merge some of the notes into one. I feel additional insight. When I note something down, I feel that I can immediately move on, wipe my working memory, and focus fully on something else at that time. I have confidence that I'll be able to revisit that idea later during review and process it when I have more time. My note has grown quite giant over the last few years. It feels nice to scroll through some of the old things/thoughts that occupied me a long time ago. Sometimes ideas don't stand the repeated scrutiny of a review and they just sink deeper down. Sometimes I'm surprised that I've thought about something for so long. And sometimes an idea from a while ago is suddenly relevant in a new light. One text note ftw.
Andrej Karpathy tweet media
English
207
280
3.9K
464.1K
Seung Joon Choi
Seung Joon Choi@erucipe·
@hwchung27 @MIT Thank you for the great talk! It really helped me form a better perspective, so I translated it into Korean to make it more accessible to a wider Korean audience. "가르치지 말고, 인센티브를 부여하라" docs.google.com/document/d/1Vg…
한국어
1
0
3
426
Hyung Won Chung
Hyung Won Chung@hwchung27·
Here is my talk at @MIT (after some delay😅) I made this talk last year when I was thinking about a paradigm shift. This delayed posting is timely as we just released o1, which I believe is a new paradigm. It's a good time to zoom out for high level thinking. (1/11)
English
19
180
1.1K
394.9K
Seung Joon Choi
Seung Joon Choi@erucipe·
@jlfwong @zzznah I used to enjoy playing LocoRoco, a platform video game developed in 2006 for the PlayStation Portable. The game's unique mechanics and charming style, developed by Tsutomu Kouno, seem to have influenced me unconsciously. LocoRoco: en.wikipedia.org/wiki/LocoRoco
English
0
0
1
444
Seung Joon Choi
Seung Joon Choi@erucipe·
Thanks to @jlfwong for the Metaball tutorial (2014) and @zzznah for the Particle Lenia tutorial (2023). I injected these two codes into Claude's context, and Claude suggested using DBSCAN for clustering, which was a great starting point for this work.
English
1
0
2
504
Caleb
Caleb@calebfahlgren·
@erucipe Yeah it only worked on mobile for me.
English
1
0
4
672
Caleb
Caleb@calebfahlgren·
Who said Claude doesn’t have code interpreter :)
Caleb tweet media
English
8
12
142
65.1K
Seung Joon Choi
Seung Joon Choi@erucipe·
If that were possible, it would be feasible to implement things like a CPPN using NumPy through Claude's artifact.
Seung Joon Choi tweet media
English
0
0
0
279
Seung Joon Choi
Seung Joon Choi@erucipe·
It would be great if artifacts from Claude 3.5 Sonnet could import Pyodide. Unfortunately, the Cloudflare version has CORS issues, and the jsdelivr version cannot be imported from artifacts. Are there any plans to support Pyodide or other WASM in the future? @alexalbert__
English
1
0
0
354
Seung Joon Choi
Seung Joon Choi@erucipe·
@_sholtodouglas voxel terrain + shading ( The entire code is too long to be generated within the Artifacts at once. )
Seung Joon Choi tweet media
English
0
0
5
668