Leonardo Lucio Custode

52 posts

Leonardo Lucio Custode

Leonardo Lucio Custode

@LLCustode

Senior AI Researcher

انضم Aralık 2019
366 يتبع70 المتابعون
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@udaysy @fchollet Even neural networks can be inspected, modified and understood up to a certain size. The problem is the scale of the software/networks that makes them blackboxes: when the effort of inspecting them becomes sufficiently large, you're likely going to treat them as a black boxes
English
1
0
2
158
Uday Yatnalli
Uday Yatnalli@udaysy·
@fchollet the analogy breaks because code isnt a blackbox. you can read, test, refactor every line. the spec quality problem matters tho. most people write vague specs and blame the agent when it takes shortcuts
English
6
0
62
7.3K
François Chollet
François Chollet@fchollet·
Sufficiently advanced agentic coding is essentially machine learning: the engineer sets up the optimization goal as well as some constraints on the search space (the spec and its tests), then an optimization process (coding agents) iterates until the goal is reached. The result is a blackbox model (the generated codebase): an artifact that performs the task, that you deploy without ever inspecting its internal logic, just as we ignore individual weights in a neural network. This implies that all classic issues encountered in ML will soon become problems for agentic coding: overfitting to the spec, Clever Hans shortcuts that don't generalize outside the tests, data leakage, concept drift, etc. I would also ask: what will be the Keras of agentic coding? What will be the optimal set of high-level abstractions that allow humans to steer codebase 'training' with minimal cognitive overhead?
English
171
383
3.3K
320.3K
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@trobuling @fchollet Depending on your position in the company, I think it may be your job to inform your employer (or anyone above you) about potential wrong hardware/infrastructure choices
English
0
0
1
28
Troubling Mind
Troubling Mind@trobuling·
@fchollet Not actionable if you're an employee. Your employer tells you what hardware to use.
English
2
0
2
1.1K
François Chollet
François Chollet@fchollet·
In engineering, you should first solve your problem within a relaxed design space, and only *then* should you determine the minimal constraints required to implement that solution. Don't settle on the hardware before you know what software you'll need to run. Don't design the robot before you understand the task.
English
39
63
774
70.9K
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@EdSealing @DamienTeney @ziv_ravid Well, not necessarily, they might be very different paths. Humans have their own biases because of the evolutionary path that led us here. Going from 0 to human, and then from human to optimal might be more expensive than 0 to optimal (e.g. alphazero vs alphago)
English
1
0
1
30
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
Andrej is excellent and really knows how to explain complex ideas in a great way **but** (and here I'm putting my computational neuroscientist's hat), LLMs/agents/models/computers are not humans. The path for better AI is **not** to mimic how humans think and react. Also, when someone says AI will be in 10 years, it just means "it will not happen in the next year. We see an improvement, but I don't have a clue when and if we will get there" (which is also my view!)
Andrej Karpathy@karpathy

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my thinking thread, so I think I botched a few explanations due to that, and sometimes I was also nervous that I'm going too much on a tangent or too deep into something relatively spurious. Anyway, a few notes/pointers: AGI timelines. My comments on AGI timelines looks to be the most trending part of the early response. This is the "decade of agents" is a reference to this earlier tweet x.com/karpathy/statu… Basically my AI timelines are about 5-10X pessimistic w.r.t. what you'll find in your neighborhood SF AI house party or on your twitter timeline, but still quite optimistic w.r.t. a rising tide of AI deniers and skeptics. The apparent conflict is not: imo we simultaneously 1) saw a huge amount of progress in recent years with LLMs while 2) there is still a lot of work remaining (grunt work, integration work, sensors and actuators to the physical world, societal work, safety and security work (jailbreaks, poisoning, etc.)) and also research to get done before we have an entity that you'd prefer to hire over a person for an arbitrary job in the world. I think that overall, 10 years should otherwise be a very bullish timeline for AGI, it's only in contrast to present hype that it doesn't feel that way. Animals vs Ghosts. My earlier writeup on Sutton's podcast x.com/karpathy/statu… . I am suspicious that there is a single simple algorithm you can let loose on the world and it learns everything from scratch. If someone builds such a thing, I will be wrong and it will be the most incredible breakthrough in AI. In my mind, animals are not an example of this at all - they are prepackaged with a ton of intelligence by evolution and the learning they do is quite minimal overall (example: Zebra at birth). Putting our engineering hats on, we're not going to redo evolution. But with LLMs we have stumbled by an alternative approach to "prepackage" a ton of intelligence in a neural network - not by evolution, but by predicting the next token over the internet. This approach leads to a different kind of entity in the intelligence space. Distinct from animals, more like ghosts or spirits. But we can (and should) make them more animal like over time and in some ways that's what a lot of frontier work is about. On RL. I've critiqued RL a few times already, e.g. x.com/karpathy/statu… . First, you're "sucking supervision through a straw", so I think the signal/flop is very bad. RL is also very noisy because a completion might have lots of errors that might get encourages (if you happen to stumble to the right answer), and conversely brilliant insight tokens that might get discouraged (if you happen to screw up later). Process supervision and LLM judges have issues too. I think we'll see alternative learning paradigms. I am long "agentic interaction" but short "reinforcement learning" x.com/karpathy/statu…. I've seen a number of papers pop up recently that are imo barking up the right tree along the lines of what I called "system prompt learning" x.com/karpathy/statu… , but I think there is also a gap between ideas on arxiv and actual, at scale implementation at an LLM frontier lab that works in a general way. I am overall quite optimistic that we'll see good progress on this dimension of remaining work quite soon, and e.g. I'd even say ChatGPT memory and so on are primordial deployed examples of new learning paradigms. Cognitive core. My earlier post on "cognitive core": x.com/karpathy/statu… , the idea of stripping down LLMs, of making it harder for them to memorize, or actively stripping away their memory, to make them better at generalization. Otherwise they lean too hard on what they've memorized. Humans can't memorize so easily, which now looks more like a feature than a bug by contrast. Maybe the inability to memorize is a kind of regularization. Also my post from a while back on how the trend in model size is "backwards" and why "the models have to first get larger before they can get smaller" x.com/karpathy/statu… Time travel to Yann LeCun 1989. This is the post that I did a very hasty/bad job of describing on the pod: x.com/karpathy/statu… . Basically - how much could you improve Yann LeCun's results with the knowledge of 33 years of algorithmic progress? How constrained were the results by each of algorithms, data, and compute? Case study there of. nanochat. My end-to-end implementation of the ChatGPT training/inference pipeline (the bare essentials) x.com/karpathy/statu… On LLM agents. My critique of the industry is more in overshooting the tooling w.r.t. present capability. I live in what I view as an intermediate world where I want to collaborate with LLMs and where our pros/cons are matched up. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless. For example, I don't want an Agent that goes off for 20 minutes and comes back with 1,000 lines of code. I certainly don't feel ready to supervise a team of 10 of them. I'd like to go in chunks that I can keep in my head, where an LLM explains the code that it is writing. I'd like it to prove to me that what it did is correct, I want it to pull the API docs and show me that it used things correctly. I want it to make fewer assumptions and ask/collaborate with me when not sure about something. I want to learn along the way and become better as a programmer, not just get served mountains of code that I'm told works. I just think the tools should be more realistic w.r.t. their capability and how they fit into the industry today, and I fear that if this isn't done well we might end up with mountains of slop accumulating across software, and an increase in vulnerabilities, security breaches and etc. x.com/karpathy/statu… Job automation. How the radiologists are doing great x.com/karpathy/statu… and what jobs are more susceptible to automation and why. Physics. Children should learn physics in early education not because they go on to do physics, but because it is the subject that best boots up a brain. Physicists are the intellectual embryonic stem cell x.com/karpathy/statu… I have a longer post that has been half-written in my drafts for ~year, which I hope to finish soon. Thanks again Dwarkesh for having me over!

English
15
3
175
47.5K
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@DamienTeney @EdSealing @ziv_ravid Also, imo, sticking too much to recreating human-like AI will basically have little value. If you could replicate with 100% accuracy a human brain, it would have the same limitations as humans, which by definition will not enable any super-human behavior
English
1
0
1
27
Damien Teney
Damien Teney@DamienTeney·
@EdSealing @ziv_ravid There's very little in common between biological and artificial neural networks. It's a metaphor at best.
English
2
0
3
52
Andrej Karpathy
Andrej Karpathy@karpathy·
Making slides manually feels especially painful now that you know Cursor for slides should exist but doesn’t.
English
952
494
12.3K
2.7M
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@NeuralRunner The special issue is open to both methodological contributions and to applications to any area where interpretability could be a great added value
English
0
0
0
18
Neural Runner
Neural Runner@NeuralRunner·
@LLCustode Interpretable RL is crucial for real-world adoption. Is the special issue focusing on any specific domains?
English
1
0
1
19
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@seth_quant Yes, and in many cases they can match the performance of non-interpretable methods! Looking forward to submissions that close the gap in many other areas where Interpretable RL is still behind non-interpretable methods!
English
0
0
2
20
Seth Quantion
Seth Quantion@seth_quant·
@LLCustode Interpretable methods could enhance understanding of RL; looking forward to practical case studies.
English
1
0
1
20
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@emollick Is it super-human in *all* the experiments though? I see that the results are not statistically significant in various plots
English
0
0
0
14
Ethan Mollick
Ethan Mollick@emollick·
Updated paper by physicians at Harvard, Stanford, and other academic medical centers testing o1-preview for medical reasoning & diagnosis tasks: “In all experiments—both vignettes and emergency room second opinions—the LLM displayed superhuman diagnostic and reasoning abilities.”
Ethan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet media
English
26
212
1.2K
200K
Jason Wei
Jason Wei@_jasonwei·
AlphaEvolve is deeply disturbing for RL diehards like yours truly Maybe midtrain + good search is all you need for AI for scientific innovation And what an alpha move to keep it secret for a year Congrats big G
English
51
96
1.8K
267.1K
Skinner
Skinner@skinnnnnnnner·
@fchollet This is a modern myth. We are optimized for walking, not running.
English
1
0
10
525
François Chollet
François Chollet@fchollet·
Long-distance running is one of the defining abilities of our species (which we developed for endurance hunting). It has shaped much of our biology, e.g. the loss of our fur (to facilitate thermoregulation), our uniquely long legs, high lung capacity, etc. Go run for a few hours
English
79
123
2.3K
187.1K
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@VictorTaelin Would you say that this is related to the architecture or to the way the architecture is currently trained (i.e., data-driven)?
English
0
0
0
25
Taelin
Taelin@VictorTaelin·
"perfect next token prediction requires reasoning" Of course it does! *Transformers* aren't reasoning though. And they'll always fail to predict a next token that requires reasoning. Another next-token prediction architecture could very well do it. But transformers can't.
English
51
15
396
51.4K
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
@jsuarez If you look at the search space, in the discrete case you have to pick a value from N possibilities (N is the number of actions), while in the continuous case you have to choose N values from R^N or [-1, 1]^N in practice, which makes the search much larger and harder to optimize
English
0
0
0
68
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
Is continuous control fundamentally harder to learn than a discrete action space? Project I started yesterday: implement several tasks that can be trained either discrete or continuous. Make the envs ultra-high perf. Run 100B+ steps worth of hyperparam sweeps.
English
9
1
33
4.8K
Leonardo Lucio Custode
Leonardo Lucio Custode@LLCustode·
Excited to share that we got four papers accepted @ GECCO 2024! Here's a brief recap of our papers!
English
1
1
8
396