Michael Matthews

151 posts

Michael Matthews

Michael Matthews

@mitrma

PhD student @FLAIR_Ox

Oxford, United Kingdom เข้าร่วม Aralık 2011
383 กำลังติดตาม831 ผู้ติดตาม
ทวีตที่ปักหมุด
Michael Matthews
Michael Matthews@mitrma·
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
English
14
205
1K
161.4K
Michael Matthews รีทวีตแล้ว
Michael Beukman
Michael Beukman@mcbeukman·
1/ As compute continues to grow and simulators continue to improve, it is becoming feasible to train RL agents for billions or trillions of timesteps. However, this is only useful if agents can continue learning over such long training horizons, which is far from given 👇
Michael Beukman tweet media
English
5
43
325
85.4K
Michael Matthews รีทวีตแล้ว
Oscar Michel
Oscar Michel@ojmichel4·
📢Current world models aren't really modeling the world; they're modeling one agent's view of it. Partial observations ≠ world state. Future world models will be independent of any one agent's perspective. You will be able to “drop in” any number of agents at any point in time, and a persistent world state will evolve with their interactions. Imagine a neural MMORPG server. 🧵[1/10]
English
13
87
613
123.8K
Michael Matthews รีทวีตแล้ว
Benjamin Spiegel
Benjamin Spiegel@superspeeg·
Why did only humans invent graphical systems like writing? 🧠✍️ In our new paper at @cogsci_soc, we explore how agents learn to communicate using a model of pictographic signification similar to human proto-writing. 🧵👇
English
23
177
1.1K
151.1K
Michael Matthews รีทวีตแล้ว
Alex Goldie
Alex Goldie@AlexDGoldie·
🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩 It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵
GIF
English
1
24
109
29.8K
Michael Matthews รีทวีตแล้ว
Jakob Foerster
Jakob Foerster@j_foerst·
My Oxford lab (@FLAIR_Ox ) is hiring Phd students! If you are thinking of doing a Phd in blue-sky and -sort of crazy ambitious- ML and have a technically strong background and love to work with others, please consider all options for joining us: 1) Direct entry - deadline is the 1st of Dec AOE (ox.ac.uk/admissions/gra…) 2) AIMS CDT (ox.ac.uk/admissions/gra…) deadline on 27th of Jan 2026 AOE 3) EIT CDT (ox.ac.uk/admissions/gra…) deadline on the 7th of Jan 2026 AOE Student funding is a real constraint / concern in the UK (especially for overseas students) and by applying for these three programs you can maximize your chances of ending up in a very very special place.
English
3
30
159
14.1K
Michael Matthews รีทวีตแล้ว
Raj Ghugare
Raj Ghugare@GhugareRaj·
Scalable learning mechanisms for agents that solve novel tasks via experience remain an open problem. We argue that a key reason is suitable benchmarks. Simply put, most current generation of interactive benchmarks lack diversity in the skills that could be learned from them. Presenting BuilderBench, a benchmark to accelerate research in pre-training that centers learning from experience. Website: rajghugare19.github.io/builderbench/i…
Raj Ghugare tweet media
English
2
22
54
12.9K
Michael Matthews
Michael Matthews@mitrma·
A great read - and very happy to see Kinetix featured!
Nathan Benaich@nathanbenaich

🪩The one and only @stateofai 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:

English
1
1
4
569
Michael Matthews รีทวีตแล้ว
Denis Tarasov
Denis Tarasov@ML_is_overhyped·
I’m asking for help. I was meant to start my PhD with @_rockt and @robertarail at UCL, but my UK background check was refused. My appeal seems unlikely to succeed, so I’m urgently searching for any PhD or research positions in academia or industry. Any help is appreciated.
English
14
35
251
40.8K
Michael Matthews รีทวีตแล้ว
Mikael Henaff
Mikael Henaff@HenaffMikael·
Introducing Scalable Option Learning (SOL☀️), a blazingly fast hierarchical RL algorithm that makes progress on long-horizon tasks and demonstrates positive scaling trends on the largely unsolved NetHack benchmark, when trained for 30 billion samples. Details, paper and code in >
English
1
11
74
16.8K
Michael Matthews รีทวีตแล้ว
Matthew Jackson
Matthew Jackson@JacksonMattT·
Unifloral has been accepted as an Oral at NeurIPS 2025! Immensely grateful to my @FLAIR_Ox co-authors @uljadb99 and @JarekLiesen for pouring months of effort into this project. There’s a ton of low-hanging fruit in offline RL… If you’re looking for a project, check it out!
Matthew Jackson tweet media
Matthew Jackson@JacksonMattT

🌹 Today we're releasing Unifloral, our new library for Offline Reinforcement Learning! We make research easy: ⚛️ Single-file 🤏 Minimal ⚡️ End-to-end Jax Best of all, we unify prior methods into one algorithm - a single hyperparameter space for research! ⤵️

English
3
22
179
33.2K
Michael Matthews รีทวีตแล้ว
Bartłomiej Cupiał
Bartłomiej Cupiał@CupiaBart·
Almost all agentic pipelines prompt LLMs to explicitly plan before every action (ReAct), but turns out this isn't optimal for Multi-Step RL 🤔 Why? In our new work we highlight a crucial issue with ReAct and show that we should make and follow plans instead🧵
Bartłomiej Cupiał tweet media
English
5
40
173
34.9K
Antoine Cully
Antoine Cully@CULLYAntoine·
Almost exactly 10 years after joining @imperialcollege as a Postdoc, I am honoured to announce that I am now Professor in Machine Learning and Robotics! 👨‍🎓 🤖 My fantastic team found the best gift to celebrate this special occasion!
Antoine Cully tweet media
English
31
6
239
11.5K
Michael Matthews รีทวีตแล้ว
Sam Earle
Sam Earle@Smearle_RH·
We introduce PuzzleJAX, a benchmark for reasoning and learning. 🧩💡🦎 PuzzleJAX compiles hundreds of existing grid-based PuzzleScript games to hardware-accelerated JAX environments, and allows researchers to define new tasks via PuzzleScript's concise rewrite rule-based DSL.
GIF
English
5
40
178
34.1K
Michael Matthews รีทวีตแล้ว
Martin Klissarov
Martin Klissarov@MartinKlissarov·
As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵
English
12
63
284
35.8K
Michael Matthews รีทวีตแล้ว
Samuel Garcin
Samuel Garcin@SamuelGarcin·
You work on RL from pixels, and you're tired to wait 10 hours for a DMC run to finish? Or up to 100 hours, if you add video distractors? Well, we got you covered : PixelBrax can run your continuous control experiments from pixels in < 1 hr! Come chat with @trevormcinroe and I at RLDM poster #103 this afternoon!
English
1
3
16
1K
Michael Matthews รีทวีตแล้ว
Mikael Henaff
Mikael Henaff@HenaffMikael·
A couple bits of news: 1. Happy to share my first (human) NetHack ascension-next step is RL agents :) 2. I wrote a post discussing some @NetHack_LE challenges & how they map to open problems in RL & agentic AI. Still the best RL benchmark imo. mikaelhenaff.substack.com/p/first-nethac…
Mikael Henaff tweet media
English
5
13
62
11.5K
Michael Matthews รีทวีตแล้ว
Seohong Park
Seohong Park@seohong_park·
Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓
English
12
152
937
173.3K