Michael Matthews

151 posts

Michael Matthews

@mitrma

PhD student @FLAIR_Ox

Oxford, United Kingdom เข้าร่วม Aralık 2011

383 กำลังติดตาม831 ผู้ติดตาม

ทวีตที่ปักหมุด

Michael Matthews@mitrma·11 Kas

We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵

English

205

161.4K

Michael Matthews รีทวีตแล้ว

Michael Beukman@mcbeukman·3d

1/ As compute continues to grow and simulators continue to improve, it is becoming feasible to train RL agents for billions or trillions of timesteps. However, this is only useful if agents can continue learning over such long training horizons, which is far from given 👇

English

325

85.4K

Michael Matthews รีทวีตแล้ว

Tim Rocktäschel@_rockt·25 Mar

"The only unsaturated agentic intelligence benchmark in the world" Excuse me? @NetHack_LE is unsaturated since 2020.

ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English

211

41.2K

Michael Matthews รีทวีตแล้ว

Oscar Michel@ojmichel4·26 Şub

📢Current world models aren't really modeling the world; they're modeling one agent's view of it. Partial observations ≠ world state. Future world models will be independent of any one agent's perspective. You will be able to “drop in” any number of agents at any point in time, and a persistent world state will evolve with their interactions. Imagine a neural MMORPG server. 🧵[1/10]

English

613

123.8K

Michael Matthews รีทวีตแล้ว

Benjamin Spiegel@superspeeg·22 Nis

Why did only humans invent graphical systems like writing? 🧠✍️ In our new paper at @cogsci_soc, we explore how agents learn to communicate using a model of pictographic signification similar to human proto-writing. 🧵👇

English

177

1.1K

151.1K

Michael Matthews@mitrma·28 Oca

@_rockt @HenaffMikael Thanks Tim! Now if only I could BC myself...

English

Tim Rocktäschel@_rockt·28 Oca

@HenaffMikael @mitrma Congrats @mitrma. Outstanding achievement!

English

Mikael Henaff@HenaffMikael·26 Oca

my sample size of 1 is that it took me ~20min to solve all 9 levels of a random ARC-AGI 3 puzzle vs. 6 months to finish Nethack once

Tim Rocktäschel@_rockt

After ARC-AGI 3 is saturated there will still be @NetHack_LE / balrogai.com left to conquer.

English

Michael Matthews รีทวีตแล้ว

Alex Goldie@AlexDGoldie·1 Ara

🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵

GIF

English

109

29.8K

Michael Matthews รีทวีตแล้ว

Jakob Foerster@j_foerst·18 Kas

My Oxford lab (@FLAIR_Ox ) is hiring Phd students! If you are thinking of doing a Phd in blue-sky and -sort of crazy ambitious- ML and have a technically strong background and love to work with others, please consider all options for joining us: 1) Direct entry - deadline is the 1st of Dec AOE (ox.ac.uk/admissions/gra…) 2) AIMS CDT (ox.ac.uk/admissions/gra…) deadline on 27th of Jan 2026 AOE 3) EIT CDT (ox.ac.uk/admissions/gra…) deadline on the 7th of Jan 2026 AOE Student funding is a real constraint / concern in the UK (especially for overseas students) and by applying for these three programs you can maximize your chances of ending up in a very very special place.

English

159

14.1K

Michael Matthews@mitrma·22 Eki

Interning with Mikael has been one of the best experiences of my PhD - would highly recommend this opportunity to anyone!

Mikael Henaff@HenaffMikael

I'm looking for a PhD intern for next year, co-advised with Scott Fujimoto, for a project developing sample-efficient RL algorithms for long-horizon decision-making. If you've worked on off-policy/MBRL, hierarchical RL, embodied AI, we'd love to hear from you! Contact below.

English

3.1K

Michael Matthews รีทวีตแล้ว

Raj Ghugare@GhugareRaj·10 Eki

Scalable learning mechanisms for agents that solve novel tasks via experience remain an open problem. We argue that a key reason is suitable benchmarks. Simply put, most current generation of interactive benchmarks lack diversity in the skills that could be learned from them. Presenting BuilderBench, a benchmark to accelerate research in pre-training that centers learning from experience. Website: rajghugare19.github.io/builderbench/i…

English

12.9K

Michael Matthews@mitrma·9 Eki

A great read - and very happy to see Kinetix featured!

Nathan Benaich@nathanbenaich

🪩The one and only @stateofai 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:

English

569

Michael Matthews รีทวีตแล้ว

Denis Tarasov@ML_is_overhyped·9 Eki

I’m asking for help. I was meant to start my PhD with @_rockt and @robertarail at UCL, but my UK background check was refused. My appeal seems unlikely to succeed, so I’m urgently searching for any PhD or research positions in academia or industry. Any help is appreciated.

English

251

40.8K

Michael Matthews รีทวีตแล้ว

Mikael Henaff@HenaffMikael·1 Eki

Introducing Scalable Option Learning (SOL☀️), a blazingly fast hierarchical RL algorithm that makes progress on long-horizon tasks and demonstrates positive scaling trends on the largely unsolved NetHack benchmark, when trained for 30 billion samples. Details, paper and code in >

English

16.8K

Michael Matthews รีทวีตแล้ว

Matthew Jackson@JacksonMattT·23 Eyl

Unifloral has been accepted as an Oral at NeurIPS 2025! Immensely grateful to my @FLAIR_Ox co-authors @uljadb99 and @JarekLiesen for pouring months of effort into this project. There’s a ton of low-hanging fruit in offline RL… If you’re looking for a project, check it out!

Matthew Jackson@JacksonMattT

🌹 Today we're releasing Unifloral, our new library for Offline Reinforcement Learning! We make research easy: ⚛️ Single-file 🤏 Minimal ⚡️ End-to-end Jax Best of all, we unify prior methods into one algorithm - a single hyperparameter space for research! ⤵️

English

179

33.2K

Michael Matthews รีทวีตแล้ว

Bartłomiej Cupiał@CupiaBart·5 Eyl

Almost all agentic pipelines prompt LLMs to explicitly plan before every action (ReAct), but turns out this isn't optimal for Multi-Step RL 🤔 Why? In our new work we highlight a crucial issue with ReAct and show that we should make and follow plans instead🧵

English

173

34.9K

Michael Matthews@mitrma·4 Eyl

@CULLYAntoine @imperialcollege Congratulations Antoine! 🥳

English

102

Antoine Cully@CULLYAntoine·3 Eyl

Almost exactly 10 years after joining @imperialcollege as a Postdoc, I am honoured to announce that I am now Professor in Machine Learning and Robotics! 👨‍🎓 🤖 My fantastic team found the best gift to celebrate this special occasion!

English

239

11.5K

Michael Matthews รีทวีตแล้ว

Sam Earle@Smearle_RH·27 Ağu

We introduce PuzzleJAX, a benchmark for reasoning and learning. 🧩💡🦎 PuzzleJAX compiles hundreds of existing grid-based PuzzleScript games to hardware-accelerated JAX environments, and allows researchers to define new tasks via PuzzleScript's concise rewrite rule-based DSL.

GIF

English

178

34.1K

Michael Matthews รีทวีตแล้ว

Martin Klissarov@MartinKlissarov·27 Haz

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵

English

284

35.8K

Michael Matthews รีทวีตแล้ว

Samuel Garcin@SamuelGarcin·13 Haz

You work on RL from pixels, and you're tired to wait 10 hours for a DMC run to finish? Or up to 100 hours, if you add video distractors? Well, we got you covered : PixelBrax can run your continuous control experiments from pixels in < 1 hr! Come chat with @trevormcinroe and I at RLDM poster #103 this afternoon!

English

Michael Matthews รีทวีตแล้ว

Mikael Henaff@HenaffMikael·9 Haz

A couple bits of news: 1. Happy to share my first (human) NetHack ascension-next step is RL agents :) 2. I wrote a post discussing some @NetHack_LE challenges & how they map to open problems in RL & agentic AI. Still the best RL benchmark imo. mikaelhenaff.substack.com/p/first-nethac…

English

11.5K

Michael Matthews รีทวีตแล้ว

Seohong Park@seohong_park·5 Haz

Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓

English

152

937

173.3K

ค้นพบ

@NetHack_LE @cogsci_soc @_rockt @HenaffMikael @FLAIR_Ox @robertarail @uljadb99 @JarekLiesen