Ryan Bahlous-Boldi

86 posts

Ryan Bahlous-Boldi

@RyanBoldi

PhD student @MIT_CSAIL | Continual RL, Open-Endedness, Evolution

Cambridge, MA Katılım Nisan 2017

635 Takip Edilen387 Takipçiler

Ryan Bahlous-Boldi@RyanBoldi·28 Mar

Great work on increasing diversity in answers given by a language model 🎢

Isha Puri@ishapuri101

ChatGPT several times where's best to go for spring break? It recommends Barcelona almost every time. This isn't a fluke. RL training rewards one best answer, so the model learns to commit to one mode and repeat it. Meet Multi-Answer RL: a simple RL method that trains LMs to reason through and output a distribution of answers in a single generation. [1/N]

English

292

Ryan Bahlous-Boldi@RyanBoldi·23 Mar

Excited to be spending the week at @SakanaAILabs in Tokyo!

English

191

Ryan Bahlous-Boldi@RyanBoldi·12 Mar

Also relevant is the work on the "myth of the objective" by @kenneth0stanley and @joelbot3000 or some of our work on lexicase selection. Perhaps the best way to do well on next token prediction, and downstream tasks, is to not train on next token prediction!

English

Ryan Bahlous-Boldi@RyanBoldi·12 Mar

Pre-pre-training with synthetic training data is better than equivalent training with language! Sometimes a detour in your training objective can be helpful... Reminds me of this blogpost I wrote with @andreiskiii: ryanboldi.github.io/detour-blog-fi…

Seungwook Han@seungwookh

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)

English

539

Ryan Bahlous-Boldi retweetledi

Itamar Pres@PresItamar·5 Mar

New paper: It's time to optimize for 🔁self-consistency 🔁 We’ve pushed LLMs to the limits of available data, yet failures like sycophancy and factual inconsistency persist. We argue these stem from the same assumption: that behavior can be specified one I/O pair at a time. 🧵

English

425

72.4K

Ryan Bahlous-Boldi@RyanBoldi·8 Oca

Program Synthesis🤝 Open-Ended Evolution. We let an LLM loose in Core War!

Akarsh Kumar@akarshkumar0101

Check out our new Digital Red Queen work! Core War is a programming game where assembly programs fight against each other for control of a Turing-complete virtual machine. We ask what happens when an LLM drives an evolutionary arms race in this domain. We find that as you run our DRQ algorithm for longer, the resulting programs become more generally robust, while also showing evidence of convergence across independent runs - a sign of convergent evolution!

English

247

Ryan Bahlous-Boldi@RyanBoldi·11 Eki

@akarshkumar0101 @kaixhin Congrats Akarsh! Well deserved.

English

195

Akarsh Kumar@akarshkumar0101·10 Eki

@kaixhin Thanks Kai!

English

9.6K

Kai Arulkumaran@kaixhin·10 Eki

Congrats to @akarshkumar0101 for winning the best paper talk award at #ALIFE2025! 👏

English

8.9K

Ryan Bahlous-Boldi@RyanBoldi·10 Eki

🔥🔥

Jyo Pari@jyo_pari

After weeks of learning about systems at @scaleml, we’re shifting gears to video foundation models. Thrilled to have @cloneofsimo sharing how to train them from scratch next Tuesday — no better person to learn from 🔥

ART

327

Ryan Bahlous-Boldi retweetledi

Pulkit Agrawal@pulkitology·23 Eyl

Introducing Perioperation: a new paradigm for collecting multimodal (vision-touch-proprioception) data for fine dexterous manipulation. See DEXOP in action -- an exoskeleton for capturing rich force and visual feedback as a human performs everyday tasks. Our design ensures such data easily transfers to a robot, unlocking historically hard tasks for robots. Find out more: dex-op.github.io Work led by @haoshu_fang

English

381

27.8K

Ryan Bahlous-Boldi@RyanBoldi·5 Eyl

A step towards forgetting about catastrophic forgetting!

Jyo Pari@jyo_pari

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

English

277

Ryan Bahlous-Boldi retweetledi

Sam Earle@Smearle_RH·27 Ağu

We introduce PuzzleJAX, a benchmark for reasoning and learning. 🧩💡🦎 PuzzleJAX compiles hundreds of existing grid-based PuzzleScript games to hardware-accelerated JAX environments, and allows researchers to define new tasks via PuzzleScript's concise rewrite rule-based DSL.

GIF

English

178

34.1K

Ryan Bahlous-Boldi retweetledi

Lance Ying@LanceYing42·21 Tem

A hallmark of human intelligence is the capacity for rapid adaptation, solving new problems quickly under novel and unfamiliar conditions. How can we build machines to do so? In our new preprint, we propose that any general intelligence system must have an adaptive world model, i.e. they must be able to rapidly construct or refine their internal representation through interaction and exploration — a process we call “world model induction”. We propose a roadmap for evaluating adaptive world models in machines based on a special class of games we call “novel games”.

English

100

511

69K

Ryan Bahlous-Boldi@RyanBoldi·21 Tem

Visiting @FlowersINRIA this week to dive deeper into a project I’ve been working on with @ClementRomac, @cedcolas, and @pyoudeyer all about balancing exploration and exploitation in autotelic RL agents. Preview soon 👀

English

1.1K

Ryan Bahlous-Boldi retweetledi

Tyler Brooke-Wilson@T_BrookeWilson·18 Tem

How do people reason while still staying coherent – as if they have an internal ‘world model’ for situations they’ve never encountered? A new paper on open-world cognition (preview at the world models workshop at #ICML2025!)

English

147

19.9K

Ryan Bahlous-Boldi retweetledi

Graham Todd@gdrtodd_·7 Tem

Excited to introduce the first version of Ludax, a domain-specific language for board games that compiles directly into JAX code! Preprint: arxiv.org/abs/2506.22609 Code: github.com/gdrtodd/ludax

GIF

English

187

28.4K

Ryan Bahlous-Boldi@RyanBoldi·8 May

@_connor_casey Everything is RL? I knew it

English

connor casey@_connor_casey·7 May

at umass the function of comp sci research converges @ rl....even in quantum @RyanBoldi

English

206

Ryan Bahlous-Boldi@RyanBoldi·13 Nis

Excited to share that I’ll be joining @MIT this fall as a PhD student in EECS! Grateful to everyone that has supported me along the way. Can’t wait to explore RL, evolution, learning, reasoning and intelligence in all its forms 🧠

English

175

11.6K

Ryan Bahlous-Boldi@RyanBoldi·2 Nis

Check out this awesome work by @maxencefaldor !

Maxence Faldor@maxencefaldor

How can we collect the best stepping stones for open-ended discovery? I am excited to share Learned Quality-Diversity — a family of algorithms meta-optimized to be efficient stepping stone collectors! 📄: arxiv.org/abs/2502.02190 🌟: github.com/maxencefaldor/… work done in collaboration with @RobertTLange from @SakanaAILabs and @CullyAntoine.

English

598

Ryan Bahlous-Boldi@RyanBoldi·1 Nis

@harshit_sikchi @scottniekum @yayitsamyzhang @marcgbellemare @yukez @PeterStone_TX Congrats Harshit!

English

Harshit Sikchi@harshit_sikchi·1 Nis

Successfully defended my Ph.D. today 🎓🥳! @scottniekum and @yayitsamyzhang are the best advisors I could have ever asked for. A big thanks to my committee members @marcgbellemare @yukez @PeterStone_TX . The full presentation video will be uploaded soon... Excited about what's to come!

English

198

11.1K

Ryan Bahlous-Boldi retweetledi

François Chollet@fchollet·20 Mar

Much of the field obsesses over end-to-end learning. But strong generalization requires compositionality: building modular, reusable abstractions, and reassembling them on the fly when faced with novelty. The models of the future won't be just pipes, they will be Lego castles.

English

163

1.3K

136.6K

Keşfet

@SakanaAILabs @kenneth0stanley @joelbot3000 @andreiskiii @akarshkumar0101 @kaixhin @haoshu_fang @FlowersINRIA