Leonardo Cotta

1.3K posts

Leonardo Cotta banner
Leonardo Cotta

Leonardo Cotta

@cottascience

floptimistic @EITOxford from BH 🔺 🇧🇷

London, UK Katılım Temmuz 2018
517 Takip Edilen1.3K Takipçiler
Leonardo Cotta retweetledi
Jonathan Gorard
Jonathan Gorard@getjonwithit·
I think one of the conclusions we should draw from the tremendous success of LLMs is how much of human knowledge and society exists at very low levels of Kolmogorov complexity. We are entering an era where the minimal representation of a human cultural artifact... (1/12)
English
189
496
4.5K
751.6K
Leonardo Cotta
Leonardo Cotta@cottascience·
@PMinervini the "structural" part is what's most interesting tbh. It's the prior over programs that should gives better insights than traditional bayesian opt
English
1
0
0
49
Pasquale Minervini
Pasquale Minervini@PMinervini·
and I'm trying to make agents go beyond hyperparam tweaking:
Pasquale Minervini tweet media
English
1
0
2
400
Pasquale Minervini
Pasquale Minervini@PMinervini·
autoresearch agents seem to have issues looping forever, so I'm doing this instead:
Pasquale Minervini tweet media
English
4
1
18
8.8K
Leonardo Cotta
Leonardo Cotta@cottascience·
@anshulkundaje thanks for this! if every ai+bio paper had a nice/simple codebase like this the field would've made a lot more progress :)
English
0
0
3
138
Anshul Kundaje
Anshul Kundaje@anshulkundaje·
Check out a vetted Pytorch port of AlphaGenome by GenomicsxAI collaborative team. See QT thread (with links to code & blogpost). Various fine tuning modules + tutorials coming next. Community building announcements coming soon as well. Follow blog for latest updates.
Alejandro Buendia@abuen_dia

Thrilled to announce alphagenome-pytorch, an accurate, readable, and careful port of AlphaGenome's architecture and weights to PyTorch. Work with @gtcaa @m_kjellberg @chriswzou @tuxinming as part of the GenomicsxAI initiative between @anshulkundaje and @pkoo562 labs.

English
3
24
120
15.5K
Andrej Karpathy
Andrej Karpathy@karpathy·
@a_karvonen On one branch of exploration yesterday an agent noticed that switching the order of the QK Norm and RoPE worked better. Which hyperparameter does that?
English
20
5
377
45.6K
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
966
2.1K
19.3K
3.5M
Leonardo Cotta retweetledi
New Scientist
New Scientist@newscientist·
Chris Maddison was just an intern when he started working on the Go-playing AI that would eventually become AlphaGo. A decade later, he talks about that match against Lee Sedol and what came next #Echobox=1773079441" target="_blank" rel="nofollow noopener">newscientist.com/article/251845…
English
0
4
9
6K
Leonardo Cotta retweetledi
Leonardo de Moura
Leonardo de Moura@Leonard41111588·
AI is writing a growing share of the world's software. No one is formally verifying any of it. New essay: "When AI Writes the World's Software, Who Verifies It?" leodemoura.github.io/blog/2026/02/2…
English
41
248
1.6K
418.4K
Leonardo Cotta retweetledi
Quentin Berthet
Quentin Berthet@qberthet·
🚨 🔬 PhD positions at Google DeepMind in France 🇫🇷 We are advertising Master Level Intern positions at Google DeepMind within our Frontier AI Unit. These could lead to co-advised PhD positions with Google DeepMind and French academic institutions. job-boards.greenhouse.io/deepmind/jobs/…
English
8
60
595
52.1K
Leonardo Cotta
Leonardo Cotta@cottascience·
@anshulkundaje our biggest mistake as computer scientists is to think of the real world like software. scaling something on a computer is much, much easier. measurement errors, broken machines, temperature variance, and heavy rains don't exist in Lean
English
0
0
2
202
Anshul Kundaje
Anshul Kundaje@anshulkundaje·
"There’s no reason at this point that you need to have grad students pipetting one thing into another thing". Release one carefully curated pilot case study piiggybacked on recent breakthroughs. Claim grad students pipetting days are over. Huge potential but no need for this.
TBPN@tbpn

OpenAI's @kevinweil says 24/7 robotic labs could automate scientific discovery using "reinforcement learning with a loop through the real world": "There’s a lot of science that can be totally automated. There’s no reason at this point that you need to have grad students pipetting one thing into another thing." "The idea is to have robotic labs that are online 24/7 and can scale in parallel. You have models reasoning for two days to find the most efficient experiments to run, once they get to a good point, they pass that to a robotic lab which can experiment in parallel at high volume." "The results pass back into a model which reasons about the results and then goes out and runs a different set of experiments. You’re doing reinforcement learning with a loop through the real world."

English
7
5
73
9.4K
Leonardo Cotta
Leonardo Cotta@cottascience·
@difficultyang another common mistake is to do reduction='mean' with ragged batches in sequence learning
English
0
0
1
1.9K
difficultyang
difficultyang@difficultyang·
Interview question I would have failed prior to today: why does PyTorch's DistributedDataParallel give incorrect gradients when the global objective function is computed as a sum over per-sample loss?
English
17
16
429
54.7K
Leonardo Cotta retweetledi
Séb Krier
Séb Krier@sebkrier·
Today I learnt that in 2009, neuroscientists placed a dead Atlantic salmon into an fMRI scanner, scanned it, and that this has apparently implications for AI interpretability. 🐟 They showed the dead fish pictures of humans in social situations and "asked" the fish to determine the emotions of the people. When they ran their standard statistical software, the results showed "brain activity" in the fish that correlated with the emotions. Obviously, the fish was not thinking; the "activity" was just random noise. The point of the study was to show that if you don't correct for statistical noise and use rigorous controls, your tools will find patterns where none exist. This paper claims that the same lesson should be applied in interpretability work: many researchers use various tools to explain what is happening inside a neural network (e.g. probes, SAEs etc). But some of these convincing-looking explanations can also be extracted when applied to randomly initialized and untrained AI models (the dead salmon equivalent): saliency maps remain plausible after weight randomization, sparse autoencoders find interpretable components in random transformers etc. The authors propose that we stop treating interpretability as "storytelling" and start treating it as statistical inference: doing null hypothesis testing, quantifying uncertainty more systematically, interpreting explanations as a simplified surrogate model etc. Although they also acknowledge that finding some signal in random networks doesn't automatically invalidate finding stronger signals in trained ones. I'm not interpretability researcher myself but would be curious to hear takes! arxiv.org/abs/2512.18792
Séb Krier tweet media
English
85
586
5K
387.9K
Leonardo Cotta retweetledi
Elad Hazan
Elad Hazan@HazanPrinceton·
New year, new research program: learning from a dynamical systems perspective, with @shai_s_shwartz and Nati Srebro. @Princeton students — this will be a major focus of COS 511!
Elad Hazan tweet media
English
8
84
585
42.6K
Leonardo Cotta retweetledi
Christopher Morris
Christopher Morris@chrsmrrs·
GraphBench: Next-generation graph learning benchmarking is now available! 🔍📊 This work introduces GraphBench: a comprehensive benchmarking framework for graph learning that provides principled baselines and reference performance across modern models. graphbench.io
Christopher Morris tweet media
English
2
14
41
5.8K
Leonardo Cotta retweetledi
Prof. Anna Brown
Prof. Anna Brown@KiaNeu6·
Curing diseases is not a jobs program for humans to feel special about their little existence.
Julian Togelius@togelius

I was at an event on AI for science yesterday, a panel discussion here at NeurIPS. The panelists discussed how they plan to replace humans at all levels in the scientific process. So I stood up and protested that what they are doing is evil. Look around you, I said. The room is filled with researchers of various kinds, most of them young. They are here because they love research and want to contribute to advancing human knowledge. If you take the human out of the loop, meaning that humans no longer have any role in scientific research, you're depriving them of the activity they love and a key source of meaning in their lives. And we all want to do something meaningful. Why, I asked, do you want to take the opportunity to contribute to science away from us? My question changed the course of the panel, and set the tone for the rest of the discussion. Afterwards, a number of attendees came up to me, either to thank me for putting what they felt into words, or to ask if I really meant what I said. So I thought I would return to the question here. One of the panelists asked whether I would really prefer the joy of doing science to finding a cure for cancer and enabling immortality. I answered that we will eventually cure cancer and at some point probably be able to choose immortality. Science is already making great progress with humans at the helm. We'll get fusion power and space travel some day as well. Maybe cutting humans out of the loop could speed up this process, but I don't think it would be worth it. I think it is of crucial importance that we humans are in charge of our own progress. Expanding humanity's collective knowledge is, I think, the most meaningful thing we can do. If humans could not usefully contribute to science anymore, this would be a disaster. So, no. I do not think it worth it to find a cure for cancer faster if that means we can never do science again. Many of those who came up to talk to me last night, those who asked me whether I was being serious or just trolling, thought that the premise was absurd. Of course there would always be room for humans in science. There will always be tasks only humans can do, insight only humans have, and so on. Therefore, we should welcome AI. Research is hard, and we need all the help we can get. I responded that I hoped they were right. That is, I truly hope there will always be parts of the research process which humans will be essential for. But what I was arguing against was not what we might call "weak science automation", where humans stay in the loop in important roles, but "strong science automation", where humans are redundant. Others thought it was immature to argue about this, because full science automation is not on the horizon. Again, I hope they are right. But I see no harm in discussing it now. And I certainly don't think we need research on science automation to go any further. Yet others remarked that this was a pointless argument. Science automation is coming whether we want it or not, and we'd better get used to it. The train is coming, and we can get on it or stand in its way. I think that is a remarkably cowardly argument. It is up to us as a society to decide how we use the technology we develop. It's not a train, it's a truck, and we'd better grab the steering wheel. One of the panelists made a chess analogy, arguing that lots of people play chess even though computers are now much better than humans at chess. So we might engage in science as a kind of hobby, even though the real science is done by computers. We would be playing around far from the frontier, perhaps filling in the blanks that AI systems don't care about. That was, to put it mildly, not a satisfying answer. While I love games, I certainly do not consider game-playing as meaningful as advancing human knowledge. Thanks, but no thanks. Overall, though, it was striking that most of those I talked to thanked me for raising the point, as I articulated worries that they already had. One of them remarked that if you work on automating science and are not even a little bit worried about the end goal, you are a psychopath. I would add that another possibility is that you don't really believe in what you are doing. Some might ask why I make this argument about science and not, for example, about visual art, music, or game design. That's because yesterday's event was about AI for science. But I think the same argument applies to all domains of human creative and intellectual expression. Making human intellectual or creative work redundant is something we should avoid when we can, and we should absolutely avoid it if there are no equally meaningful new roles for humans to transition into. You could further argue that working on cutting humans out of meaningful creative work such as scientific research is incredibly egoistic. You get the intellectual satisfaction of inventing new AI methods, but the next generation don't get a chance to contribute. Why do you want to rob your children (academic and biological) of the chance to engage in the most meaningful activity in the world? So what do I believe in, given that I am an AI researcher who actively works on the kind of AI methods used for automating science? I believe that AI tools that help us be more productive and creative are great, but that AI tools that replace us are bad. I love science, and I am afraid of a future where we are pushed back into the dark ages because we can no longer contribute to science. Human agency, including in creative processes, is vital and must be safeguarded at almost any cost. I don't exactly know how to steer AI development and AI usage so that we get new tools but are not replaced. But I know that it is of paramount importance.

English
0
2
10
1.8K
Leonardo Cotta retweetledi
Ashish Vaswani
Ashish Vaswani@ashVaswani·
We are beyond thrilled to share our first flagship models, Rnj-1 base and instruct 8B parameter models. Rnj-1 is the culmination of 10 months of hard work by a phenomenal team, dedicated to advancing American SOTA OSS AI. Lots of wins with Rnj-1. 1. SWE bench performance close to GPT 4o. 2. Tool use outperforming all comparable open source models. 3. Mathematical reasoning (AIME’25) nearly at par with GPT OSS MoE 20B. ….
Essential AI@essential_ai

Today, we’re excited to introduce Rnj-1, @essential_ai's first open model; a world-class 8B base + instruct pair, built with scientific rigor, intentional design, and a belief that the advancement and equitable distribution of AI depend on building in the open. We bring American open-source at par with the best in the world.

English
103
171
1.8K
603.2K
Leonardo Cotta retweetledi
Alessandro Sordoni
Alessandro Sordoni@murefil·
This was a great group effort ❤️. Check the thread below! My 2c: we train a 32B coding agent by distilling a strong teacher model on a mix of real and synthetic bugs generated by our new approach BugPilot 🛩️! BugPilot creates bugs unintentionally, by asking the teacher to insert new features in a given repo, and by checking whether the synthetic feature breaks existing functionality... a bug is born 🐣! Claude is such a strong teacher model, paired with our bugs, our 32B FrogBoss 🐸 (cause it eats bugs) model achieves 54.6 pass@1 (avg of 3 seeds) and ~67 pass@3. Just selecting the shortest over 3 (minimal TTS) gets us to ~56.8. 🚨 Internships here: we have many ideas so we'd be excited if you want to work with us going forward, pls apply! jobs.careers.microsoft.com/global/en/job/…
Isadora White@isadorcw

Excited to introduce our SoTA coding models, FrogBoss (32B) and FrogMini (14B), on SWE-Bench-Verified! (FrogBoss eats bugs… like a boss) 🐸🪲 These models were trained with bugs from a mix of existing and our new synthetic bug generation approach, called BugPilot. (1/n)

English
0
7
40
9.6K