Ryan Sullivan

354 posts

Ryan Sullivan

@RyanSullyvan

Postdoc @UBC_CS with @jeffclune (RL, Curriculum Learning, Open-Endedness) | PhD from @UofMaryland | Previously RL @SonyAI_global and RLHF @Google

Katılım Mart 2013

297 Takip Edilen438 Takipçiler

Ryan Sullivan retweetledi

Joseph Suarez 🐡@jsuarez·3h

Releasing PufferLib 4.0: Train agents in seconds

English

412

38.4K

Ryan Sullivan@RyanSullyvan·26 Mar

@abhishekunique7 This is really cool! Do you have any thoughts on how this approach relates to Go-Explore? arxiv.org/abs/2004.12919 It seems like the motivation is similar but you have a new way of generating interesting states to start from.

English

517

Abhishek Gupta@abhishekunique7·26 Mar

Excited to share the project that has surprised me the most in the last year! Large-scale RL in simulation, no demos and no reward engineering can solve dynamic, dexterous and contact rich tasks. The learned behaviors are reactive, forceful and use the environment for recovery in ways that are extremely challenging to bake in or teleoperate! You can play with the policies yourself to see: weirdlabuw.github.io/omnireset/ And, the learned behavior transfers to real world robots from RGB camera inputs! So what’s the trick - using simulator resets carefully! Let’s unpack (1/10)

English

605

76.9K

Ryan Sullivan retweetledi

Jeff Clune@jeffclune·25 Mar

The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature!!✨ Today in Nature we share a comprehensive technical summary of our work on The AI Scientist, including new scaling law results showing how it improves with more compute and more intelligent foundation models. The AI Scientist autonomously creates its own research ideas, codes up and conducts experiments to test those ideas, creates figures to visualize the results, writes an entire scientific manuscript summarizing what it has discovered, and conducts its own “peer” review of the resulting paper. One of its papers–entirely AI generated–passed peer review at a top-tier AI conference workshop, a historic milestone marking the dawn of a new era of AI-accelerated scientific discovery. 🔬🧪✨🧬💡🔭 Paper nature.com/articles/s4158… Blog sakana.ai/ai-scientist-n… Work done in collaboration with a great team from Sakana, Oxford, and my lab at UBC. Thanks and congratulations everyone! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru

English

219

706

78.7K

Ryan Sullivan retweetledi

Jenny Zhang@jennyzhangzt·23 Mar

Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself. The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve. We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving. We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).

English

154

658

3.6K

491.4K

Ryan Sullivan@RyanSullyvan·13 Mar

@sethkarten Thanks for answering my questions! This is a cool project, it would be great if coding agents could make it easier for everyone to work on (currently) slow complex environments

English

Seth Karten@sethkarten·13 Mar

@RyanSullyvan Fair point. I appreciate your nuanced comments

English

Seth Karten@sethkarten·13 Mar

x.com/i/article/2031…

ZXX

121

35.5K

Ryan Sullivan@RyanSullyvan·13 Mar

@sethkarten If you’re saying that these environments have better reward design thats great, but I think an exact optimized copy is a valuable first step. Both to compare to prior work and to demonstrate that the system can faithfully recreate environments.

English

Ryan Sullivan@RyanSullyvan·13 Mar

@sethkarten Sorry I should have been more specific. In Figure 4 for PokeJAX, the training curves look very different (the optimized env scores 60 elo higher). I actually didnt see the gray line for Red but yes that’s also very different. It seems like the envs aren’t the same for training

English

Ryan Sullivan@RyanSullyvan·2 Mar

@akshitwt You might be interested in this paper by @akarshkumar0101 which used foundation models to search for more interesting artificial life simulations arxiv.org/abs/2412.17799

English

134

Akshit@akshitwt·2 Mar

introducing a new, very fun, LLM benchmark- the Game-of-Life Bench! the rules are simple: given an 8x8 grid following Conway's game of life rules, the goal is to create an initial pattern with at most 32 cells that can last the longest number of turns before dying/repeating. some results to highlight (with caveats detailed below): - gpt 5.1 lasts the longest with a 106 step run - claude models are really bad at this! they refuse to reason about this task and score < 25 points - deepseek r1 is the best open model with 102 steps. why? because i wanted to create a benchmark that has (i think) no practicality, but is still fun to look at, cheap, and still measures something interesting. i also am a big fan of the game of life. its absurdly simple rules leading to intractability is extremely cool to me. also, i saw a lot of work with LLMs trying to "predict" the next state in Conway's game of life, I think game-of-life bench is more fun because it's pretty open ended and only asks the LLM for the initial state. I also think this could be an RL env? but idk why you would ever train on this task haha i don't think this is a "serious" benchmark because it doesnt measure anything practical, but i still think it's a hard benchmark exactly because you can't predict what happens with your initial state many turns into the future; this is why i was initially expecting all LLMs to be bad at it, but turns out, some are clearly better than the others (the ordering may surprise you!) reminder: this is still a work-in-progress; (1) i am gpu-poor so could only do 10 runs for each model, even though total running cost is relatively low. maybe with some more credits i can run more seeds for each model. (2) i handpicked models which i think are at the frontier right now, plus some others that were on my mind. so, if you'd like to see a model on here, let me know. (3) i currently only do an 8x8 grid because i thought that by itself would be pretty hard for current LLMs, but of course we can increase grid sizes! (4) the coolest thing is, i dont think we can calculate the max possible number of states (yay undecidability!) you can go without repeating, so this is essentially a no-ceiling task, which is pretty cool! again, i did this mostly out of a desire to make LLMs do something fun. if this keeps me entertained for a few more days, i'd likely release a blog post on it. if it keeps me entertained for a week (and someone sponsors me), i'll put more work into it :P lastly, this is fully open sourced, so feel free to run this on your own!

English

136

13.6K

Ryan Sullivan retweetledi

Shlok Kumar Mishra@shlokkkk·26 Şub

🧵 Introducing Xray-Visual (XRV): Scaling Unified Vision Models to 26 Billion Samples. 1/ Do standard vision encoders like SigLIP or DINO generalise to out-of-distribution (OOD) data at scale? While these models dominate academic leaderboards, we observe significant performance degradation when they are confronted with complex, real-world distributions. Introducing Xray-Visual (XRV).

English

2.1K

Ryan Sullivan retweetledi

Samyadeep Basu@BasuSamyadeep·24 Şub

Checkout our work on controlling image editing via sliders in SoTA models appearing at #cvpr2026. Bonus: our method does not require paired data for sliders! ↘️

Samyadeep Basu@BasuSamyadeep

Image editing is usually one-shot. SliderEdit changes that—letting users smoothly control how strongly each attribute is applied, from subtle tweaks to bold transformations. Checkout our new paper and demo on "continuous" image editing led by @arman_zareii.

English

559

Ryan Sullivan retweetledi

Konstantinos Mitsides@k_mitsides·19 Şub

Can large language models (LLMs) act as the imagination of a reinforcement learning (RL) agent? We found that if you let an LLM "dream" - not by hallucinating pixels, but by writing executable Python code - it can create an open-ended curriculum that drives progress in complex, long-horizon worlds. Introducing Dreaming in Code (DiCode). 🧵👇

English

14.3K

Ryan Sullivan@RyanSullyvan·14 Şub

@alexUnder_sky It's starting to look like we'll fully automate coding before LLMs can make it to Mine's End.

English

sacha🥝@alexUnder_sky·14 Şub

@RyanSullyvan nethack

English

Ryan Sullivan@RyanSullyvan·14 Şub

We're quickly approaching the point where LLMs can code more complex games than they can play. It's funny to think that games, which are designed to be easily learnable and where task-specific AI already excels, might be the hardest domain for general intelligence.

English

399

Ryan Sullivan retweetledi

Shengran Hu@shengranhu·11 Şub

Memory is probably the biggest challenge for building practical AI agents. Thrilled to share our work exploring a shift from manually defining memory for each domain → enabling agents to design better memory mechanisms for themselves. Meta-learning memory designs unlocks agents that can learn to continually learn across various tasks. Absolute joy working with @yimingxiong_ and @jeffclune on this!! code: github.com/zksha/alma paper: arxiv.org/pdf/2602.07755

Jeff Clune@jeffclune

Can AI agents design better memory mechanisms for themselves? Introducing Learning to Continually Learn via Meta-learning Memory Designs. A meta agent automatically designs memory mechanisms, including what info to store, how to retrieve it, and how to update it, enabling agentic systems to continually learn across diverse domains. Led by @yimingxiong_ with @shengranhu 🧵👇 1/

English

516

82.2K

Ryan Sullivan retweetledi

Jeff Clune@jeffclune·10 Şub

Following our previous work on ADAS (x.com/shengranhu/sta…) and DGM (x.com/SakanaAILabs/s…), ALMA is a step toward AI-generating algorithms, including AI with continual learning. Looking ahead, we envision agentic systems that learn to improve all aspects of their agentic system, including their memory (i.e., combining ADAS/DGM and ALMA), learning to continually learn while solving problems in ever-changing real-world environments! The work was led by @yimingxiong_ with excellent mentorship by @shengranhu. Congrats to both on the excellent work! 7/7 🚀✨🎉🤖🧠🌍🔬 arxiv: arxiv.org/pdf/2602.07755 github: github.com/zksha/alma website: yimingxiong.me/alma

Sakana AI@SakanaAILabs

Introducing The Darwin Gödel Machine: AI that improves itself by rewriting its own code sakana.ai/dgm The Darwin Gödel Machine (DGM) is a self-improving agent that can modify its own code. Inspired by evolution, we maintain an expanding lineage of agent variants, allowing for open-ended exploration of the vast design space of such “self-improving” agents. Modern agentic systems, while powerful, remain static—once deployed, their intelligence remains fixed. We believe continuous self-improvement is key to the development of stronger AI capabilities. Our Darwin Gödel Machine is built from the ground up to enable AI systems that can learn and evolve their own capabilities over time, just as humans do. On SWE-bench, DGM automatically improved its performance from 20.0% to 50.0%. Similarly, on Polyglot, the DGM increased its success rate from an initial 14.2% to 30.7%, significantly outperforming representative hand-designed agents. Learn more about our approach in our technical report: arxiv.org/abs/2505.22954 This work was done in collaboration with Jeff Clune (@jeffclune)’s lab at UBC, and led by his PhD students Jenny Zhang (@jennyzhangzt) and Shengran Hu (@shengranhu), together with Cong Lu (@cong_ml) and Robert Lange (@RobertTLange). Code: github.com/jennyzzt/dgm

English

105

10.4K

Ryan Sullivan retweetledi

Borja G. León@borruell·30 Oca

We're looking for Research Scientists and Engineers to join the AI team at @iconicgamesio in London. n London. We have diverse positions including: Model Optimization/Efficient Inference, Open-Endedness/Reinforcement Learning, and Generative Vision & Multimodal Foundational Models. Apply here: ats.rippling.com/en-GB/iconic/j… Learn more about Iconic: iconicgames.io We're a small, growing team with high ownership, crafting the minds that inhabit and shape new worlds for interactive entertainment. If you're into games, blueprinting consciousness, or simply building personas that transmit and evoke feelings and emotions beyond what any chatbot assistant could ever do, this is your place. These openings are also suitable for final-year PhD students with experience in the relevant field. Any questions, my DMs are open.

English

171

19.7K

Ryan Sullivan retweetledi

Roberta Raileanu@robertarail·7 Oca

📢 New PhD Position 📢 We (@_rockt, @borruell, and I) are looking for a PhD student to work at the intersection of open-endedness and game design. The student will be part of the @UCL_DARK lab and funded by @iconicgamesio and UCL. See this doc for a more detailed description of the research direction and candidate expectations: docs.google.com/document/d/1Z7… To apply, please complete this form by January 15: docs.google.com/forms/d/16JGfS…

English

360

43.5K

Ryan Sullivan retweetledi

Akarsh Kumar@akarshkumar0101·8 Oca

Check out our new Digital Red Queen work! Core War is a programming game where assembly programs fight against each other for control of a Turing-complete virtual machine. We ask what happens when an LLM drives an evolutionary arms race in this domain. We find that as you run our DRQ algorithm for longer, the resulting programs become more generally robust, while also showing evidence of convergence across independent runs - a sign of convergent evolution!

Sakana AI@SakanaAILabs

Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: sakana.ai/drq Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by @akarshkumar0101 Full Paper (Website): pub.sakana.ai/drq/ Full Paper (arxiv): arxiv.org/abs/2601.03335 Code: github.com/SakanaAI/drq/

English

103

22K

Ryan Sullivan retweetledi

Jeff Clune@jeffclune·7 Ara

Tomorrow/Sunday 10:15–10:35 Keynote Talk 3 (Jeff Clune) MindGames workshop if you are interested. I'll try to make it fun and controversial! mindgamesarena.com

English

15.4K

Keşfet

@abhishekunique7 @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru