Levi Lelis

581 posts

Levi Lelis

@levilelis

Associate Professor - University of Alberta - Canada CIFAR AI Chair (he/him, ele/dele). Machine learning as program search.

Edmonton, Canada Entrou em Temmuz 2009

564 Seguindo721 Seguidores

Tweet fixado

Levi Lelis@levilelis·18 Ara

I recently spoke at IPAM's Naturalistic Approaches to Artificial Intelligence Workshop, and shared some of the programmatic perspectives we're exploring in reinforcement learning research. youtu.be/UNpg05yxc3o?si…

YouTube

English

1.4K

Levi Lelis retweetou

Marlos C. Machado@MarlosCMachado·3d

A couple of months ago, we released a preprint of one of my favourite papers I’ve ever written. It lies at the intersection of representation learning and neuroscience. I have now written a blog post about it. Preprint: biorxiv.org/content/10.110… Blog post: @marlos.cholodovskis/from-pixels-to-place-cells-where-representation-learning-meets-neuroscience-72140afe6e3f" target="_blank" rel="nofollow noopener">medium.com/@marlos.cholod…

English

186

12.9K

Levi Lelis retweetou

Amii@AmiiThinks·21 Şub

Amii is hiring a Machine Learning Resident (1-year term) to work with ConeTec! Help solve critical safety challenges in geocharacterization using LLMs, OCR, and Deep Learning. 📍 Edmonton (Hybrid) 📅 Apply by March 4, 2026 🔗 amii.bamboohr.com/careers/231

English

563

Levi Lelis retweetou

Amii@AmiiThinks·21 Şub

Amii is hiring a Machine Learning Scientist to lead our ML Educators and scale AI literacy across Canada. If you have a background in ML research, people leadership, and a passion for AI for good, apply now: amii.bamboohr.com/careers/229

English

578

Levi Lelis retweetou

Clem Bonnet@ClementBonnet16·11 Şub

Insightful thread about world models with ideas that very few people in the industry understand! Building static, giant world models is a dead end for achieving human-level adaptation to new tasks. Instead, it's all about efficiently adapting local models of the world. The community should develop systems that produce world models (a la program synthesis) rather than static models.

Edward Hu@edward_s_hu

Nobody asked, but here's 4 world model papers that I read early on in my PhD which I still ponder over now. - Value Equivalence Principle - Learning Awareness Models - Embedded Agency (figure pic below), Big World Hypothesis See the thread for details:

English

1.4K

Levi Lelis retweetou

Amjad Masad@amasad·14 Oca

To make a bit of an excuse for Microsoft: the world is just waking up to the fact that coding agents are general agents. It’s bitter lesson adjacent: Writing and executing code will likely outperform years of handcrafting vertical-specific agents with expert knowledge. Actually it might exactly map in bitter lesson: Program synthesis is a form of scalable search.

English

127

1.7K

455.3K

Levi Lelis@levilelis·13 Oca

@SakanaAILabs We also studied how to make the self-play process more computationally efficient and how to speed up search using LLM-constructed program libraries. If useful, here is a good entry point: webdocs.cs.ualberta.ca/~santanad/pape… Happy to chat privately!

English

Levi Lelis@levilelis·13 Oca

Interesting work! We observed similar behavior in our work on programmatic strategies applied to an RTS game. In particular, training an agent to defeat all previous versions of itself is an implementation of fictitious play, which we found leads to more robust programs than iterated best response (which only plays with the latest version of the agent).

English

171

Sakana AI@SakanaAILabs·8 Oca

Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: sakana.ai/drq Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by @akarshkumar0101 Full Paper (Website): pub.sakana.ai/drq/ Full Paper (arxiv): arxiv.org/abs/2601.03335 Code: github.com/SakanaAI/drq/

English

577

141.8K

Levi Lelis retweetou

Marlos C. Machado@MarlosCMachado·20 Kas

The Department of Computing Science at the University of Alberta at the University of Alberta has an opening for another tenure-track faculty in robotics. Please, spread the word. I can attest to how awesome @UAlbertaCS and @AmiiThinks are! (Official job posting coming soon.)

English

2.5K

Levi Lelis@levilelis·20 Kas

@risi1979 @MelMitchell1 Congratulations, Sebastian, I’m looking forward to reading it!

English

523

Sebastian Risi@risi1979·20 Kas

I’m beyond excited to announce our MIT Press book on Neuroevolution! An HTML version is now available for free on neuroevolutionbook.com, with a print edition coming out later in 2026. Real intelligence is not static; it evolves. For decades, the field of neuroevolution has pursued this necessary adaptability. Our book chronicles its development, from early concepts to its modern integration with deep learning and reinforcement learning, exploring its potential for understanding the origins of intelligence and its real-world applications. And the companion webpage is more than just a book site! It comes equipped with interactive demos, videos, exercises, and tutorials to allow everyone to experience neuroevolution in action. Check it out and let us know what you think! It was a pleasure to work on this book over the last 4+ years with David (@hardmaru), Yujin (@yujin_tang), and Risto. We are incredibly proud of the result and look forward to celebrating! We hope to connect with many of you at NeurIPS. We are very grateful to Melanie Mitchell (@MelMitchell1) who provided a fantastic foreword. To quote her: “The next big thing in AI is coming, and I suspect that neuroevolution will be a major part of it”. We think so too!

English

167

645

96.4K

Levi Lelis@levilelis·26 Eyl

@lucasvegi Parabéns Lucas!

Português

Lucas Vegi@lucasvegi·26 Eyl

It's still hard to believe! ❤️🙏

English

709

Levi Lelis retweetou

Cohere Labs@Cohere_Labs·22 Eyl

Join our Reinforcement Learning Group next week on Monday, September 29th for a session with Esraa Elelimy on "Deep Reinforcement Learning with Gradient Eligibility Traces." Thanks to @rahul_narava for organizing this event ✨ Learn more: cohere.com/events/cohere-…

English

4.2K

Levi Lelis retweetou

Matthew Macfarlane@MattVMacfarlane·19 Eyl

Happy to share that Searching Latent Program Spaces has been accepted as a Spotlight at #NeurIPS2025 ✨ It's been a pleasure to work with @ClementBonnet16 on this! See you all in San Diego 🌴 👋, arxiv.org/pdf/2411.08706

English

188

14.3K

Levi Lelis retweetou

Alona Fyshe (she/her)@alonamarie·16 Eyl

I am hiring a post doc at @UAlberta , affiliated with @AmiiThinks ! We study language processing in the brain using LLMs and neuroimaging. Looking for someone with experience with ideally both neuroimaging and LLMs, or a willingness to learn. Email me Qs apps.ualberta.ca/careers/postin…

English

3.6K

Levi Lelis retweetou

Richard Sutton@RichardSSutton·2 Eyl

My acceptance speech at the Turing award ceremony: Good evening ladies and gentlemen. The main idea of reinforcement learning is that a machine might discover what to do on its own, without being told, from its own experience, by trial and error. As far as I know, the first person to propose this was Alan Turing in 1947, which makes it particularly gratifying and humbling to receive this award in his name for reviving this essential but still nascent idea. I have three people that I would like to particularly thank. First, Andy Barto. As my PhD supervisor he taught me my whole approach to science, and in particular instilled in me an appreciation of scholarship and craft, and of the great breath of prior work. Second, I would like to thank Oliver Selfridge, my other main mentor; sadly, now deceased. Oliver taught me how keeping ideas simple can be the boldest of all ambitions. Third, I want to thank Martha Steenstrup, my life partner and intellectual sparring partner. She keeps me honest and grounded. Finally, I also want to thank the University of Alberta, which has been an ideal environment for me and for reinforcement learning research these past 22 years. These three people and my university have reinforced in me the ambition to have ideas that matter, without getting too full of myself about it. They taught me that the quest for better ideas is serious, but is best approached playfully, with humility, kindness, and optimism. For this I am eternally grateful. I would also like to thank all of you for being here and for celebrating the pursuit of intellectual excellence. Thank you very much.

English

227

2.3K

183K

Levi Lelis@levilelis·22 Ağu

Rina’s work has inspired me since my early days as a PhD student. I’m so happy to see her receive this very well-deserved award. Congratulations, Rina!

IJCAIconf@IJCAIconf

#IJCAI2025 What inspires her research? Rina Dechter, 2025 IJCAI Research Excellence Award recipient, takes us on a journey in her #Invited talk: Graphical Models Meet Heuristic Search: A Personal Journey into Automated Reasoning 📆 22 August, 2 PM 🌐 2025.ijcai.org/invited-talks/

English

363

Levi Lelis retweetou

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z·10 Ağu

@DimitrisPapail Test Time Compute was "invented" the same way America was "discovered".. x.com/rao2z/status/1…

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

Inference Time Computation is NOT new--we wanted to get rid of it, but are letting it back in out of necessity.. #SundayHarangue (#NeurIPS2024 workshop edition) Noam Brown @polynoamial has been giving talks on o1 suggesting that including inference time computation was a relatively newer idea in games (which he and others have brought to LLMs). While I have a lot of respect for Noam (he is probably one of the handful of frontier folks who actually has a good understanding of pre- 2013 AI) , I am afraid that in this particular case, his characterization gets the chronology mostly wrong for prominent games--most of which were "deliberative" rather than "reflex" agents in Russell/Norvig's terminology. Many games--including Chess and Go--focused exclusively on inference time compute in the beginning! (See my Intro #AI slide below..) The 1997 Deep Blue, for example, was all inference time compute (using alpha-beta pruning on shallow game trees where the leaves were evaluated by (mostly hand-coded) evaluation functions--plus a library of end games. Pre-Alpha-Go approaches for GO went with just MCT at inference time. For these games, it is the idea of learning an approximate policy off-line and using it to complement the already standard inference time computation (via generalized policy rollout) that is the latter development! TD-Gammon is more of an exception which tried first spending time "off-line" to get a policy and make a largely reflex agent. (Partly because between Samuel's Chekers and TD-Gammon, there were few RL/Learning based approaches for Games..) So when Noam says he and Tuomas started poker first with off-line approximate policy learning and just using that during inference time, and then recognized that online policy roll out is actually helping, they went in the reverse order of what happened in Chess and Go! The appeal for the off-line policy computation was that you can spend unlimited amounts of time behind the scenes, so that online computation can be mostly taken out. Many of us still remember marveling at the fact that the learning time for AlphaGo would have been about 1700 years on the common desktops of that time! This learn everything upfront with a close-to reflex agent became so strong in the aftermath of AlphaGO that they tried their best to reduce the MCT search that was left over in the original AlphaGo in a stream of latter developments. Reducing user-facing inference time computation was a deliberate choice--and has even lead to a change in the way the field started viewing computational complexity considerations (see x.com/rao2z/status/1…). This trend continued with LLMs too--with all focus on pre-training so inference time compute is kept negligible--so that it costs very little for end users, and can even be done locally on edge devices.. I suspect that the twin bitter truths (c.f. x.com/rao2z/status/1…) --that you can't quite get reasoning out of auto-regressive LLMs, and that learning an approximate pseudo-cot-action policy that is good enough would be way too costly even for OAI/Microsoft's resources--dragged them into inference time compute awkwardness (c.f. x.com/rao2z/status/1…) which certainly changes the business model by pushing the scaling costs to the edge users! (x.com/rao2z/status/1…).

English

4.3K

Levi Lelis@levilelis·9 Ağu

@Ayushj240 @RL_Conference @JosephLim_AI @ebiyik_ Congratulations Ayush!

Filipino

Ayush Jain@Ayushj240·7 Ağu

Honored that our @RL_Conference paper won the Outstanding Paper Award on Empirical Reinforcement Learning Research! 📜Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-Functions 📎openreview.net/forum?id=H3jcT… Grateful to my advisors @JosephLim_AI and @ebiyik_!

Ayush Jain@Ayushj240

At @RL_Conference🍁, I'm presenting a talk and a poster on Aug 6, Track 1: Reinforcement Learning Algorithms. We find that Deterministic Policy Gradient methods like TD3 often get stuck at local optima under complex Q-functions, and propose a novel actor architecture! 🧵

English

Levi Lelis retweetou

Shao-Hua Sun@shaohua0116·5 Ağu

Kicking off #RLC2025 with our Workshop on Programmatic Reinforcement Learning! This workshop explores how programmatic representations can improve interpretability, generalization, efficiency, and safety in RL.

English

10.1K

Levi Lelis@levilelis·31 Tem

Armando's lecture notes are my favorite resources for program synthesis. Definitely worth reading!

Ndea@ndea

The 2023 "Introduction to Program Synthesis" lecture series from Armando Solar-Lezama at @MIT_CSAIL is an amazing resource. Topics: - Inductive Synthesis - SMT & SyGuS - PS + RL - Neurosymbolic Learning "...at the intersection of programming languages, formal methods and AI."

English

266

Levi Lelis retweetou

Ndea@ndea·25 Tem

Are programmatic policies really better at generalizing OOD than neural policies, or are the benchmarks biased? This position paper revisits 4 prior studies and finds neural policies can match programmatic ones - if you adjust training (sparse observation, reward shaping, etc.)

English

Descobrir

@SakanaAILabs @akarshkumar0101 @UAlbertaCS @AmiiThinks @risi1979 @MelMitchell1 @hardmaru @yujin_tang