Matteo Gallici

4

386

Matteo Gallici@MatteoGallici·12 Kas

Parallelised Q-Networks go continuous! 🚀 We’re excited to introduce PQN extended to continuous-action control — no replay buffers, no target networks, just pure online Q-learning + a deterministic actor. Now you can use PQN + MuJoCo Playground to train full robotic policies in just seconds or minutes 🤖 All running entirely on GPU 👉 Blog post: mttga.github.io/posts/pqn_cont… 👾 Codebase: github.com/mttga/purejaxq… 1/5

English

10

45

250

16.4K

Matteo Gallici@MatteoGallici·12 Kas

Now in the same repo, you can train PQN in minutes across MinAtar, Atari, Craftax, MultiAgent Tasks, and MuJoCo Playground — making purejaxql an extensive and convenient hub for Q-learning research. 👾 Repo: github.com/mttga/purejaxq… 4/5

English

4

410

Matteo Gallici@MatteoGallici·12 Kas

PQN still means no replay buffers, no target networks — just parallelisation + network normalisation. We combine basic Q-learning with a deterministic actor, trained jointly with the critic in DDPG-style. Exploration uses Gaussian noise off-policy, while stability emerges naturally from LayerNorm + large-scale parallelisation. 👉 Details: mttga.github.io/posts/pqn_cont…

English

1

6

566

Matteo Gallici@MatteoGallici·12 Kas

We evaluate Actor–Critic PQN across the three main domains of MuJoCo Playground — 50 tasks in total: 1️⃣ DeepMind Control Suite – classic control: CartPole, Walker, Cheetah, Hopper 2️⃣ Locomotion – Unitree Go1, Boston Dynamics Spot, Google Barkour, Unitree H1/G1, Berkeley Humanoid, Booster T1, Robotis OP3 3️⃣ Manipulation – Franka Emika Panda, Robotiq gripper, and mor 2/5

English

8

691

Matteo Gallici retweetledi

Haitz Sáez de Ocáriz Borde@ocariz__·27 Ağu

🚀 “Fine-Tuning Next-Scale Visual Autoregressive Models with GRPO” w/ @MatteoGallici We adapt DeepSeek-R1’s RL techniques to image models using aesthetic + CLIP rewards. 📄 openreview.net/pdf?id=lZK2svo… #GRPO #RL #VisualModels #AutoregressiveModels #DeepSeek #CLIP #ICML

English

4

687

Matteo Gallici retweetledi

Pablo Samuel Castro@pcastr·5 Haz

The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks thrilled to share our #ICML2025 paper led by @WalterMayor_T & @johanobandoc , with @AaronCourville , where we explore how data collection affects agents in parallelized setups. 1/

English

4

15

60

8.5K

Matteo Gallici retweetledi

Jacob@jacobEkooi·22 May

📢New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345) Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!

English

2

9

48

5.6K

Matteo Gallici@MatteoGallici·19 Mar

PQN has been accepted as a Spotlight at ICRL 2025 🎉 👉If didn't have time to check it out, take a look at our 5-minute blog covering PQN’s key features, plus a Colab demo with JAX & PyTorch implmentations mttga.github.io/posts/pqn/ 🔎 If you want to dive deeper, we break down TD instability issues and how PQN tackles them without replay buffers or target networks in this other blog post: blog.foersterlab.com/fixing-td-part… See you in Singapore! 🇸🇬

🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxq… A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pur…

English

2

7

33

4.8K

Matteo Gallici@MatteoGallici·14 Kas

@vwxyzjn @creus_roger @cleanrl_lib great work @vwxyzjn @creus_roger, super happy to see PQN be part of CleanRL!

English

3

301

Matteo Gallici retweetledi

Costa Huang@vwxyzjn·14 Kas

@creus_roger just implemented a @cleanrl_lib Parallel Q-Networks algorithm (PQN) implementation! 🚀PQN is DQN without a replay buffer and target network. You can run PQN on GPU environments or vectorized environments. E.g., in envpool, PQN gets DQN's score in 1/10th the time

English

6

12

97

19.4K

Matteo Gallici@MatteoGallici·13 Kas

@mitrma Amazing work!

English

2

131

Matteo Gallici retweetledi

Michael Matthews@mitrma·11 Kas

We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵

English

14

206

1.1K

161.7K

Matteo Gallici retweetledi

Michael Beukman@mcbeukman·12 Kas

🏋️‍♂️Go from creating an environment to having a trained expert agent within minutes! As part of Kinetix, we are releasing an editor that can create custom physics-based RL environments, and import them seamlessly into an RL training loop. 1/

GIF

English

3

18

107

31.9K

Matteo Gallici retweetledi

Jakob Foerster@j_foerst·20 Tem

1/🚀 @FLAIR_Ox is coming to #icml2024 in Vienna 🎉 (I am literally posting from the train) and we are very excited to share our work with you! You can find us here ⬇️✨ see below 🔗 for clickable links

English

15

83

8.2K

Matteo Gallici retweetledi

Alex Goldie@AlexDGoldie·16 Tem

1/ 🤖 Learned optimization offers huge potential to automate machine learning! So why doesn't it work well in RL (and how did we fix it)?! I'm excited to share OPEN, our @AutoRL_Workshop spotlight paper exploring this question! 🧵

English

30

116

25.1K

Matteo Gallici retweetledi

Pablo Samuel Castro@pcastr·13 Tem

This paper looks super cool. I've started reading it and am really enjoying the clarity of exposition, theoretical investigation, and simplicity of resulting algorithm.

🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxq… A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pur…

English

4

40

4K

Matteo Gallici retweetledi

Jakob Foerster@j_foerst·12 Tem

DQN kick-started the field of deep RL 12 years ago, but Q-learning has recently taken a backseat compared to PPO and other on-policy method. We introduce PQN, a greatly simplified version of DQN which is highly GPU compatible and theoretically supported by convergence proofs.

🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxq… A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pur…

English

8

82

7.9K

Matteo Gallici retweetledi

Michael Matthews@mitrma·12 Tem

Really impactful work, most prominently in finally figuring out how to bring the speed increases of Jax to off-policy value-based algorithms! Another building block falls into place in the Jax RL ecosystem..

🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxq… A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pur…

English

Aditya Bhatt@aditya_bhatt

6

38

2.7K

Matteo Gallici retweetledi

Boris Belousov@_bbelousov·12 Tem

@MatteoGallici Very interesting work! It sounds very similar to our CrossQ paper x.com/aditya_bhatt/s…

Introducing CrossQ, just published at #ICLR2024! 🎉 CrossQ achieves: 🔥 Very fast off-policy Deep RL 📈 with SOTA sample-efficiency in <5% of the gradient steps 🧹 without target nets or Q ensembles 🧑🏻‍💻Project and Code: adityab.github.io/CrossQ Joint work w/ @DPalenicek 🧵👇

English