Leonardo Lucio Custode

52 posts

Leonardo Lucio Custode

@LLCustode

Senior AI Researcher

انضم Aralık 2019

366 يتبع70 المتابعون

Leonardo Lucio Custode@LLCustode·20 Şub

@udaysy @fchollet Even neural networks can be inspected, modified and understood up to a certain size. The problem is the scale of the software/networks that makes them blackboxes: when the effort of inspecting them becomes sufficiently large, you're likely going to treat them as a black boxes

English

158

Uday Yatnalli@udaysy·19 Şub

@fchollet the analogy breaks because code isnt a blackbox. you can read, test, refactor every line. the spec quality problem matters tho. most people write vague specs and blame the agent when it takes shortcuts

English

7.3K

François Chollet@fchollet·19 Şub

Sufficiently advanced agentic coding is essentially machine learning: the engineer sets up the optimization goal as well as some constraints on the search space (the spec and its tests), then an optimization process (coding agents) iterates until the goal is reached. The result is a blackbox model (the generated codebase): an artifact that performs the task, that you deploy without ever inspecting its internal logic, just as we ignore individual weights in a neural network. This implies that all classic issues encountered in ML will soon become problems for agentic coding: overfitting to the spec, Clever Hans shortcuts that don't generalize outside the tests, data leakage, concept drift, etc. I would also ask: what will be the Keras of agentic coding? What will be the optimal set of high-level abstractions that allow humans to steer codebase 'training' with minimal cognitive overhead?

English

171

383

3.3K

320.3K

Leonardo Lucio Custode@LLCustode·28 Eki

@trobuling @fchollet Depending on your position in the company, I think it may be your job to inform your employer (or anyone above you) about potential wrong hardware/infrastructure choices

English

Troubling Mind@trobuling·28 Eki

@fchollet Not actionable if you're an employee. Your employer tells you what hardware to use.

English

1.1K

François Chollet@fchollet·28 Eki

In engineering, you should first solve your problem within a relaxed design space, and only *then* should you determine the minimal constraints required to implement that solution. Don't settle on the hardware before you know what software you'll need to run. Don't design the robot before you understand the task.

English

774

70.9K

Leonardo Lucio Custode@LLCustode·20 Eki

@EdSealing @DamienTeney @ziv_ravid Well, not necessarily, they might be very different paths. Humans have their own biases because of the evolutionary path that led us here. Going from 0 to human, and then from human to optimal might be more expensive than 0 to optimal (e.g. alphazero vs alphago)

English

Ed Sealing@EdSealing·20 Eki

@LLCustode @DamienTeney @ziv_ravid First we mine, then we craft, then we Minecraft!! Gotta hit AGI before we can get to ASI.

English

Ravid Shwartz Ziv@ziv_ravid·20 Eki

Andrej is excellent and really knows how to explain complex ideas in a great way **but** (and here I'm putting my computational neuroscientist's hat), LLMs/agents/models/computers are not humans. The path for better AI is **not** to mimic how humans think and react. Also, when someone says AI will be in 10 years, it just means "it will not happen in the next year. We see an improvement, but I don't have a clue when and if we will get there" (which is also my view!)

Andrej Karpathy@karpathy

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my thinking thread, so I think I botched a few explanations due to that, and sometimes I was also nervous that I'm going too much on a tangent or too deep into something relatively spurious. Anyway, a few notes/pointers: AGI timelines. My comments on AGI timelines looks to be the most trending part of the early response. This is the "decade of agents" is a reference to this earlier tweet x.com/karpathy/statu… Basically my AI timelines are about 5-10X pessimistic w.r.t. what you'll find in your neighborhood SF AI house party or on your twitter timeline, but still quite optimistic w.r.t. a rising tide of AI deniers and skeptics. The apparent conflict is not: imo we simultaneously 1) saw a huge amount of progress in recent years with LLMs while 2) there is still a lot of work remaining (grunt work, integration work, sensors and actuators to the physical world, societal work, safety and security work (jailbreaks, poisoning, etc.)) and also research to get done before we have an entity that you'd prefer to hire over a person for an arbitrary job in the world. I think that overall, 10 years should otherwise be a very bullish timeline for AGI, it's only in contrast to present hype that it doesn't feel that way. Animals vs Ghosts. My earlier writeup on Sutton's podcast x.com/karpathy/statu… . I am suspicious that there is a single simple algorithm you can let loose on the world and it learns everything from scratch. If someone builds such a thing, I will be wrong and it will be the most incredible breakthrough in AI. In my mind, animals are not an example of this at all - they are prepackaged with a ton of intelligence by evolution and the learning they do is quite minimal overall (example: Zebra at birth). Putting our engineering hats on, we're not going to redo evolution. But with LLMs we have stumbled by an alternative approach to "prepackage" a ton of intelligence in a neural network - not by evolution, but by predicting the next token over the internet. This approach leads to a different kind of entity in the intelligence space. Distinct from animals, more like ghosts or spirits. But we can (and should) make them more animal like over time and in some ways that's what a lot of frontier work is about. On RL. I've critiqued RL a few times already, e.g. x.com/karpathy/statu… . First, you're "sucking supervision through a straw", so I think the signal/flop is very bad. RL is also very noisy because a completion might have lots of errors that might get encourages (if you happen to stumble to the right answer), and conversely brilliant insight tokens that might get discouraged (if you happen to screw up later). Process supervision and LLM judges have issues too. I think we'll see alternative learning paradigms. I am long "agentic interaction" but short "reinforcement learning" x.com/karpathy/statu…. I've seen a number of papers pop up recently that are imo barking up the right tree along the lines of what I called "system prompt learning" x.com/karpathy/statu… , but I think there is also a gap between ideas on arxiv and actual, at scale implementation at an LLM frontier lab that works in a general way. I am overall quite optimistic that we'll see good progress on this dimension of remaining work quite soon, and e.g. I'd even say ChatGPT memory and so on are primordial deployed examples of new learning paradigms. Cognitive core. My earlier post on "cognitive core": x.com/karpathy/statu… , the idea of stripping down LLMs, of making it harder for them to memorize, or actively stripping away their memory, to make them better at generalization. Otherwise they lean too hard on what they've memorized. Humans can't memorize so easily, which now looks more like a feature than a bug by contrast. Maybe the inability to memorize is a kind of regularization. Also my post from a while back on how the trend in model size is "backwards" and why "the models have to first get larger before they can get smaller" x.com/karpathy/statu… Time travel to Yann LeCun 1989. This is the post that I did a very hasty/bad job of describing on the pod: x.com/karpathy/statu… . Basically - how much could you improve Yann LeCun's results with the knowledge of 33 years of algorithmic progress? How constrained were the results by each of algorithms, data, and compute? Case study there of. nanochat. My end-to-end implementation of the ChatGPT training/inference pipeline (the bare essentials) x.com/karpathy/statu… On LLM agents. My critique of the industry is more in overshooting the tooling w.r.t. present capability. I live in what I view as an intermediate world where I want to collaborate with LLMs and where our pros/cons are matched up. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless. For example, I don't want an Agent that goes off for 20 minutes and comes back with 1,000 lines of code. I certainly don't feel ready to supervise a team of 10 of them. I'd like to go in chunks that I can keep in my head, where an LLM explains the code that it is writing. I'd like it to prove to me that what it did is correct, I want it to pull the API docs and show me that it used things correctly. I want it to make fewer assumptions and ask/collaborate with me when not sure about something. I want to learn along the way and become better as a programmer, not just get served mountains of code that I'm told works. I just think the tools should be more realistic w.r.t. their capability and how they fit into the industry today, and I fear that if this isn't done well we might end up with mountains of slop accumulating across software, and an increase in vulnerabilities, security breaches and etc. x.com/karpathy/statu… Job automation. How the radiologists are doing great x.com/karpathy/statu… and what jobs are more susceptible to automation and why. Physics. Children should learn physics in early education not because they go on to do physics, but because it is the subject that best boots up a brain. Physicists are the intellectual embryonic stem cell x.com/karpathy/statu… I have a longer post that has been half-written in my drafts for ~year, which I hope to finish soon. Thanks again Dwarkesh for having me over!

English

175

47.5K

Leonardo Lucio Custode@LLCustode·20 Eki

@DamienTeney @EdSealing @ziv_ravid Also, imo, sticking too much to recreating human-like AI will basically have little value. If you could replicate with 100% accuracy a human brain, it would have the same limitations as humans, which by definition will not enable any super-human behavior

English

Damien Teney@DamienTeney·20 Eki

@EdSealing @ziv_ravid There's very little in common between biological and artificial neural networks. It's a metaphor at best.

English

Leonardo Lucio Custode@LLCustode·7 Haz

@karpathy Gemini + sli.dev is the way

English

Andrej Karpathy@karpathy·6 Haz

Making slides manually feels especially painful now that you know Cursor for slides should exist but doesn’t.

English

952

494

12.3K

2.7M

Leonardo Lucio Custode@LLCustode·5 Haz

@NeuralRunner The special issue is open to both methodological contributions and to applications to any area where interpretability could be a great added value

English

Neural Runner@NeuralRunner·5 Haz

@LLCustode Interpretable RL is crucial for real-world adoption. Is the special issue focusing on any specific domains?

English

Leonardo Lucio Custode@LLCustode·5 Haz

Working on Interpretable Reinforcement Learning? Submit your work to our Special Issue on Applied Soft Computing! Submission: Dec 31, 2025 Final Decision: June 1, 2026 Details & submission: sciencedirect.com/special-issue/… #reinforcementlearning #interpretability #xai

English

810

Leonardo Lucio Custode@LLCustode·5 Haz

@seth_quant Yes, and in many cases they can match the performance of non-interpretable methods! Looking forward to submissions that close the gap in many other areas where Interpretable RL is still behind non-interpretable methods!

English

Seth Quantion@seth_quant·5 Haz

@LLCustode Interpretable methods could enhance understanding of RL; looking forward to practical case studies.

English

Leonardo Lucio Custode@LLCustode·5 Haz

Organizers: @gih82 @GNadizar @EricMedvetTs Erica Salvato, and I

Italiano

Leonardo Lucio Custode@LLCustode·23 May

@emollick Is it super-human in *all* the experiments though? I see that the results are not statistically significant in various plots

English

Ethan Mollick@emollick·22 May

Paper: arxiv.org/pdf/2412.10849

English

17.3K

Ethan Mollick@emollick·22 May

Updated paper by physicians at Harvard, Stanford, and other academic medical centers testing o1-preview for medical reasoning & diagnosis tasks: “In all experiments—both vignettes and emergency room second opinions—the LLM displayed superhuman diagnostic and reasoning abilities.”

English

212

1.2K

200K

Leonardo Lucio Custode@LLCustode·16 May

@vineettiruvadi @_jasonwei Can you list some examples?

English

Jason Wei@_jasonwei·15 May

AlphaEvolve is deeply disturbing for RL diehards like yours truly Maybe midtrain + good search is all you need for AI for scientific innovation And what an alpha move to keep it secret for a year Congrats big G

English

1.8K

267.1K

Leonardo Lucio Custode@LLCustode·30 Nis

@skinnnnnnnner @fchollet Can you elaborate on this please?

English

Skinner@skinnnnnnnner·30 Nis

@fchollet This is a modern myth. We are optimized for walking, not running.

English

525

François Chollet@fchollet·30 Nis

Long-distance running is one of the defining abilities of our species (which we developed for endurance hunting). It has shaped much of our biology, e.g. the loss of our fur (to facilitate thermoregulation), our uniquely long legs, high lung capacity, etc. Go run for a few hours

English

123

2.3K

187.1K

Leonardo Lucio Custode أُعيد تغريده

Mátyás Vincze@vinczematyas_·18 Ara

🎉 Our work "SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks" has been accepted by AAAI 2025! #AAAI2025 paper: arxiv.org/abs/2412.13053 with @L_Ferrarotti, @brulepri, @gih82 🧵1/6

English

2.1K

Leonardo Lucio Custode أُعيد تغريده

Giovanni Iacca@gih82·5 Kas

New paper out! Fast-Inf: Ultra-Fast Embedded Intelligence on the Batteryless Edge dl.acm.org/doi/10.1145/36… #tinyml #machinelearning #artificialintelligence

English

9.8K

Leonardo Lucio Custode@LLCustode·13 Eki

@VictorTaelin Would you say that this is related to the architecture or to the way the architecture is currently trained (i.e., data-driven)?

English

Taelin@VictorTaelin·12 Eki

"perfect next token prediction requires reasoning" Of course it does! *Transformers* aren't reasoning though. And they'll always fail to predict a next token that requires reasoning. Another next-token prediction architecture could very well do it. But transformers can't.

English

396

51.4K

Leonardo Lucio Custode@LLCustode·17 Tem

Happy to share that we won the Interpretable Control Competition @ #GECCO24 ! Congrats @vinczematyas_ @L_Ferrarotti @gih82 @brulepri

English

Leonardo Lucio Custode أُعيد تغريده

Giovanni Iacca@gih82·13 Tem

Our @GeccoConf papers are finally out! Multi-Objective Evolutionary Hindsight Experience Replay for Robotics doi.org/10.1145/363852… Decentralized Federated NeuroEvolution of Heterogeneous Networks doi.org/10.1145/363852… Neuron-centric Hebbian Learning doi.org/10.1145/363852…

English

813

Leonardo Lucio Custode@LLCustode·11 Tem

@jsuarez If you look at the search space, in the discrete case you have to pick a value from N possibilities (N is the number of actions), while in the continuous case you have to choose N values from R^N or [-1, 1]^N in practice, which makes the search much larger and harder to optimize

English

Joseph Suarez 🐡@jsuarez·11 Tem

Is continuous control fundamentally harder to learn than a discrete action space? Project I started yesterday: implement several tasks that can be trained either discrete or continuous. Make the envs ultra-high perf. Run 100B+ steps worth of hyperparam sweeps.

English

4.8K

Leonardo Lucio Custode@LLCustode·3 Tem

@eyeslasho What's the y axis?

English

Leonardo Lucio Custode@LLCustode·28 Haz

@adellacioppa @gih82 @CunegattiElia @ayaman00 @facaraff In "Comparing Large Language Models and Grammatical Evolution for Code Generation" (w/ @ChiaraRambaldi , Marco Roveri, @gih82), we compare LLMs and evolutionary techniques in code generation tasks. 5/5

English

203

Leonardo Lucio Custode@LLCustode·28 Haz

@adellacioppa @gih82 @CunegattiElia In "An investigation on the use of Large Language Models for hyperparameter tuning in Evolutionary Algorithms" (w/ @ayaman00, @facaraff , @gih82 ), we investigate the capabilities of LLMs in tuning hyperparameters in evolution strategies. 4/5

English

357

Leonardo Lucio Custode@LLCustode·28 Haz

Excited to share that we got four papers accepted @ GECCO 2024! Here's a brief recap of our papers!

English

396

اكتشف

@udaysy @fchollet @trobuling @EdSealing @DamienTeney @ziv_ravid @karpathy @NeuralRunner