Levi Lelis

581 posts

Levi Lelis banner
Levi Lelis

Levi Lelis

@levilelis

Associate Professor - University of Alberta - Canada CIFAR AI Chair (he/him, ele/dele). Machine learning as program search.

Edmonton, Canada Entrou em Temmuz 2009
564 Seguindo721 Seguidores
Tweet fixado
Levi Lelis
Levi Lelis@levilelis·
I recently spoke at IPAM's Naturalistic Approaches to Artificial Intelligence Workshop, and shared some of the programmatic perspectives we're exploring in reinforcement learning research. youtu.be/UNpg05yxc3o?si…
YouTube video
YouTube
English
0
2
20
1.4K
Levi Lelis retweetou
Marlos C. Machado
Marlos C. Machado@MarlosCMachado·
A couple of months ago, we released a preprint of one of my favourite papers I’ve ever written. It lies at the intersection of representation learning and neuroscience. I have now written a blog post about it. Preprint: biorxiv.org/content/10.110… Blog post: @marlos.cholodovskis/from-pixels-to-place-cells-where-representation-learning-meets-neuroscience-72140afe6e3f" target="_blank" rel="nofollow noopener">medium.com/@marlos.cholod…
English
3
35
186
12.9K
Levi Lelis retweetou
Amii
Amii@AmiiThinks·
Amii is hiring a Machine Learning Resident (1-year term) to work with ConeTec! Help solve critical safety challenges in geocharacterization using LLMs, OCR, and Deep Learning. 📍 Edmonton (Hybrid) 📅 Apply by March 4, 2026 🔗 amii.bamboohr.com/careers/231
Amii tweet media
English
0
1
3
563
Levi Lelis retweetou
Amii
Amii@AmiiThinks·
Amii is hiring a Machine Learning Scientist to lead our ML Educators and scale AI literacy across Canada. If you have a background in ML research, people leadership, and a passion for AI for good, apply now: amii.bamboohr.com/careers/229
Amii tweet media
English
0
2
2
578
Levi Lelis retweetou
Clem Bonnet
Clem Bonnet@ClementBonnet16·
Insightful thread about world models with ideas that very few people in the industry understand! Building static, giant world models is a dead end for achieving human-level adaptation to new tasks. Instead, it's all about efficiently adapting local models of the world. The community should develop systems that produce world models (a la program synthesis) rather than static models.
Edward Hu@edward_s_hu

Nobody asked, but here's 4 world model papers that I read early on in my PhD which I still ponder over now. - Value Equivalence Principle - Learning Awareness Models - Embedded Agency (figure pic below), Big World Hypothesis See the thread for details:

English
0
3
10
1.4K
Levi Lelis retweetou
Amjad Masad
Amjad Masad@amasad·
To make a bit of an excuse for Microsoft: the world is just waking up to the fact that coding agents are general agents. It’s bitter lesson adjacent: Writing and executing code will likely outperform years of handcrafting vertical-specific agents with expert knowledge. Actually it might exactly map in bitter lesson: Program synthesis is a form of scalable search.
English
49
127
1.7K
455.3K
Levi Lelis
Levi Lelis@levilelis·
Interesting work! We observed similar behavior in our work on programmatic strategies applied to an RTS game. In particular, training an agent to defeat all previous versions of itself is an implementation of fictitious play, which we found leads to more robust programs than iterated best response (which only plays with the latest version of the agent).
English
1
0
3
171
Sakana AI
Sakana AI@SakanaAILabs·
Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: sakana.ai/drq Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by @akarshkumar0101 Full Paper (Website): pub.sakana.ai/drq/ Full Paper (arxiv): arxiv.org/abs/2601.03335 Code: github.com/SakanaAI/drq/
English
21
98
577
141.8K
Levi Lelis retweetou
Marlos C. Machado
Marlos C. Machado@MarlosCMachado·
The Department of Computing Science at the University of Alberta at the University of Alberta has an opening for another tenure-track faculty in robotics. Please, spread the word. I can attest to how awesome @UAlbertaCS and @AmiiThinks are! (Official job posting coming soon.)
English
0
3
23
2.5K
Sebastian Risi
Sebastian Risi@risi1979·
I’m beyond excited to announce our MIT Press book on Neuroevolution! An HTML version is now available for free on neuroevolutionbook.com, with a print edition coming out later in 2026. Real intelligence is not static; it evolves. For decades, the field of neuroevolution has pursued this necessary adaptability. Our book chronicles its development, from early concepts to its modern integration with deep learning and reinforcement learning, exploring its potential for understanding the origins of intelligence and its real-world applications. And the companion webpage is more than just a book site! It comes equipped with interactive demos, videos, exercises, and tutorials to allow everyone to experience neuroevolution in action. Check it out and let us know what you think! It was a pleasure to work on this book over the last 4+ years with David (@hardmaru), Yujin (@yujin_tang), and Risto. We are incredibly proud of the result and look forward to celebrating! We hope to connect with many of you at NeurIPS. We are very grateful to Melanie Mitchell (@MelMitchell1) who provided a fantastic foreword. To quote her: “The next big thing in AI is coming, and I suspect that neuroevolution will be a major part of it”. We think so too!
Sebastian Risi tweet media
English
24
167
645
96.4K
Lucas Vegi
Lucas Vegi@lucasvegi·
It's still hard to believe! ❤️🙏
English
1
1
6
709
Levi Lelis retweetou
Cohere Labs
Cohere Labs@Cohere_Labs·
Join our Reinforcement Learning Group next week on Monday, September 29th for a session with Esraa Elelimy on "Deep Reinforcement Learning with Gradient Eligibility Traces." Thanks to @rahul_narava for organizing this event ✨ Learn more: cohere.com/events/cohere-…
Cohere Labs tweet media
English
2
3
23
4.2K
Levi Lelis retweetou
Richard Sutton
Richard Sutton@RichardSSutton·
My acceptance speech at the Turing award ceremony: Good evening ladies and gentlemen. The main idea of reinforcement learning is that a machine might discover what to do on its own, without being told, from its own experience, by trial and error. As far as I know, the first person to propose this was Alan Turing in 1947, which makes it particularly gratifying and humbling to receive this award in his name for reviving this essential but still nascent idea. I have three people that I would like to particularly thank. First, Andy Barto. As my PhD supervisor he taught me my whole approach to science, and in particular instilled in me an appreciation of scholarship and craft, and of the great breath of prior work. Second, I would like to thank Oliver Selfridge, my other main mentor; sadly, now deceased. Oliver taught me how keeping ideas simple can be the boldest of all ambitions. Third, I want to thank Martha Steenstrup, my life partner and intellectual sparring partner. She keeps me honest and grounded. Finally, I also want to thank the University of Alberta, which has been an ideal environment for me and for reinforcement learning research these past 22 years. These three people and my university have reinforced in me the ambition to have ideas that matter, without getting too full of myself about it. They taught me that the quest for better ideas is serious, but is best approached playfully, with humility, kindness, and optimism. For this I am eternally grateful. I would also like to thank all of you for being here and for celebrating the pursuit of intellectual excellence. Thank you very much.
English
62
227
2.3K
183K
Levi Lelis
Levi Lelis@levilelis·
Rina’s work has inspired me since my early days as a PhD student. I’m so happy to see her receive this very well-deserved award. Congratulations, Rina!
IJCAIconf@IJCAIconf

#IJCAI2025 What inspires her research? Rina Dechter, 2025 IJCAI Research Excellence Award recipient, takes us on a journey in her #Invited talk: Graphical Models Meet Heuristic Search: A Personal Journey into Automated Reasoning 📆 22 August, 2 PM 🌐 2025.ijcai.org/invited-talks/

English
0
0
6
363
Levi Lelis retweetou
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
@DimitrisPapail Test Time Compute was "invented" the same way America was "discovered".. x.com/rao2z/status/1…
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

Inference Time Computation is NOT new--we wanted to get rid of it, but are letting it back in out of necessity.. #SundayHarangue (#NeurIPS2024 workshop edition) Noam Brown @polynoamial has been giving talks on o1 suggesting that including inference time computation was a relatively newer idea in games (which he and others have brought to LLMs). While I have a lot of respect for Noam (he is probably one of the handful of frontier folks who actually has a good understanding of pre- 2013 AI) , I am afraid that in this particular case, his characterization gets the chronology mostly wrong for prominent games--most of which were "deliberative" rather than "reflex" agents in Russell/Norvig's terminology. Many games--including Chess and Go--focused exclusively on inference time compute in the beginning! (See my Intro #AI slide below..) The 1997 Deep Blue, for example, was all inference time compute (using alpha-beta pruning on shallow game trees where the leaves were evaluated by (mostly hand-coded) evaluation functions--plus a library of end games. Pre-Alpha-Go approaches for GO went with just MCT at inference time. For these games, it is the idea of learning an approximate policy off-line and using it to complement the already standard inference time computation (via generalized policy rollout) that is the latter development! TD-Gammon is more of an exception which tried first spending time "off-line" to get a policy and make a largely reflex agent. (Partly because between Samuel's Chekers and TD-Gammon, there were few RL/Learning based approaches for Games..) So when Noam says he and Tuomas started poker first with off-line approximate policy learning and just using that during inference time, and then recognized that online policy roll out is actually helping, they went in the reverse order of what happened in Chess and Go! The appeal for the off-line policy computation was that you can spend unlimited amounts of time behind the scenes, so that online computation can be mostly taken out. Many of us still remember marveling at the fact that the learning time for AlphaGo would have been about 1700 years on the common desktops of that time! This learn everything upfront with a close-to reflex agent became so strong in the aftermath of AlphaGO that they tried their best to reduce the MCT search that was left over in the original AlphaGo in a stream of latter developments. Reducing user-facing inference time computation was a deliberate choice--and has even lead to a change in the way the field started viewing computational complexity considerations (see x.com/rao2z/status/1…). This trend continued with LLMs too--with all focus on pre-training so inference time compute is kept negligible--so that it costs very little for end users, and can even be done locally on edge devices.. I suspect that the twin bitter truths (c.f. x.com/rao2z/status/1…) --that you can't quite get reasoning out of auto-regressive LLMs, and that learning an approximate pseudo-cot-action policy that is good enough would be way too costly even for OAI/Microsoft's resources--dragged them into inference time compute awkwardness (c.f. x.com/rao2z/status/1…) which certainly changes the business model by pushing the scaling costs to the edge users! (x.com/rao2z/status/1…).

English
1
3
22
4.3K
Ayush Jain
Ayush Jain@Ayushj240·
Honored that our @RL_Conference paper won the Outstanding Paper Award on Empirical Reinforcement Learning Research! 📜Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-Functions 📎openreview.net/forum?id=H3jcT… Grateful to my advisors @JosephLim_AI and @ebiyik_!
Ayush Jain tweet media
Ayush Jain@Ayushj240

At @RL_Conference🍁, I'm presenting a talk and a poster on Aug 6, Track 1: Reinforcement Learning Algorithms. We find that Deterministic Policy Gradient methods like TD3 often get stuck at local optima under complex Q-functions, and propose a novel actor architecture! 🧵

English
9
11
72
8K
Levi Lelis retweetou
Shao-Hua Sun
Shao-Hua Sun@shaohua0116·
Kicking off #RLC2025 with our Workshop on Programmatic Reinforcement Learning! This workshop explores how programmatic representations can improve interpretability, generalization, efficiency, and safety in RL.
Shao-Hua Sun tweet media
English
2
7
52
10.1K
Levi Lelis retweetou
Ndea
Ndea@ndea·
Are programmatic policies really better at generalizing OOD than neural policies, or are the benchmarks biased? This position paper revisits 4 prior studies and finds neural policies can match programmatic ones - if you adjust training (sparse observation, reward shaping, etc.)
Ndea tweet media
English
1
5
30
3K