Matteo Gallici

42 posts

Matteo Gallici banner
Matteo Gallici

Matteo Gallici

@MatteoGallici

PhD Student UPC Barcelona - Reinforcement Learning

Katılım Temmuz 2011
101 Takip Edilen371 Takipçiler
Sabitlenmiş Tweet
Matteo Gallici
Matteo Gallici@MatteoGallici·
Parallelised Q-Networks go continuous! 🚀 We’re excited to introduce PQN extended to continuous-action control — no replay buffers, no target networks, just pure online Q-learning + a deterministic actor. Now you can use PQN + MuJoCo Playground to train full robotic policies in just seconds or minutes 🤖 All running entirely on GPU 👉 Blog post: mttga.github.io/posts/pqn_cont… 👾 Codebase: github.com/mttga/purejaxq… 1/5
English
10
45
250
16.4K
Matteo Gallici
Matteo Gallici@MatteoGallici·
Reading purejax scripts can be challenging, so we also provide simplified versions that are much easier to read and debug — especially for those not yet familiar with jax. 👾 purejaxql simplified github.com/mttga/purejaxq… 5/5
English
0
0
4
386
Matteo Gallici
Matteo Gallici@MatteoGallici·
Parallelised Q-Networks go continuous! 🚀 We’re excited to introduce PQN extended to continuous-action control — no replay buffers, no target networks, just pure online Q-learning + a deterministic actor. Now you can use PQN + MuJoCo Playground to train full robotic policies in just seconds or minutes 🤖 All running entirely on GPU 👉 Blog post: mttga.github.io/posts/pqn_cont… 👾 Codebase: github.com/mttga/purejaxq… 1/5
English
10
45
250
16.4K
Matteo Gallici
Matteo Gallici@MatteoGallici·
Now in the same repo, you can train PQN in minutes across MinAtar, Atari, Craftax, MultiAgent Tasks, and MuJoCo Playground — making purejaxql an extensive and convenient hub for Q-learning research. 👾 Repo: github.com/mttga/purejaxq… 4/5
English
0
0
4
410
Matteo Gallici
Matteo Gallici@MatteoGallici·
PQN still means no replay buffers, no target networks — just parallelisation + network normalisation. We combine basic Q-learning with a deterministic actor, trained jointly with the critic in DDPG-style. Exploration uses Gaussian noise off-policy, while stability emerges naturally from LayerNorm + large-scale parallelisation. 👉 Details: mttga.github.io/posts/pqn_cont…
English
0
1
6
566
Matteo Gallici
Matteo Gallici@MatteoGallici·
We evaluate Actor–Critic PQN across the three main domains of MuJoCo Playground — 50 tasks in total: 1️⃣ DeepMind Control Suite – classic control: CartPole, Walker, Cheetah, Hopper 2️⃣ Locomotion – Unitree Go1, Boston Dynamics Spot, Google Barkour, Unitree H1/G1, Berkeley Humanoid, Booster T1, Robotis OP3 3️⃣ Manipulation – Franka Emika Panda, Robotiq gripper, and mor 2/5
Matteo Gallici tweet media
English
0
0
8
691
Matteo Gallici retweetledi
Pablo Samuel Castro
Pablo Samuel Castro@pcastr·
The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks thrilled to share our #ICML2025 paper led by @WalterMayor_T & @johanobandoc , with @AaronCourville , where we explore how data collection affects agents in parallelized setups. 1/
Pablo Samuel Castro tweet mediaPablo Samuel Castro tweet media
English
4
15
60
8.5K
Matteo Gallici retweetledi
Jacob
Jacob@jacobEkooi·
📢New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345) Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!
Jacob tweet media
English
2
9
48
5.6K
Matteo Gallici
Matteo Gallici@MatteoGallici·
PQN has been accepted as a Spotlight at ICRL 2025 🎉 👉If didn't have time to check it out, take a look at our 5-minute blog covering PQN’s key features, plus a Colab demo with JAX & PyTorch implmentations mttga.github.io/posts/pqn/ 🔎 If you want to dive deeper, we break down TD instability issues and how PQN tackles them without replay buffers or target networks in this other blog post: blog.foersterlab.com/fixing-td-part… See you in Singapore! 🇸🇬
Matteo Gallici@MatteoGallici

🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxq… A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pur…

English
2
7
33
4.8K
Matteo Gallici retweetledi
Costa Huang
Costa Huang@vwxyzjn·
@creus_roger just implemented a @cleanrl_lib Parallel Q-Networks algorithm (PQN) implementation! 🚀PQN is DQN without a replay buffer and target network. You can run PQN on GPU environments or vectorized environments. E.g., in envpool, PQN gets DQN's score in 1/10th the time
Costa Huang tweet media
English
6
12
97
19.4K
Matteo Gallici retweetledi
Michael Matthews
Michael Matthews@mitrma·
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
English
14
206
1.1K
161.7K
Matteo Gallici retweetledi
Michael Beukman
Michael Beukman@mcbeukman·
🏋️‍♂️Go from creating an environment to having a trained expert agent within minutes! As part of Kinetix, we are releasing an editor that can create custom physics-based RL environments, and import them seamlessly into an RL training loop. 1/
GIF
English
3
18
107
31.9K
Matteo Gallici retweetledi
Jakob Foerster
Jakob Foerster@j_foerst·
1/🚀 @FLAIR_Ox is coming to #icml2024 in Vienna 🎉 (I am literally posting from the train) and we are very excited to share our work with you! You can find us here ⬇️✨ see below 🔗 for clickable links
Jakob Foerster tweet media
English
1
15
83
8.2K
Matteo Gallici retweetledi
Alex Goldie
Alex Goldie@AlexDGoldie·
1/ 🤖 Learned optimization offers huge potential to automate machine learning! So why doesn't it work well in RL (and how did we fix it)?! I'm excited to share OPEN, our @AutoRL_Workshop spotlight paper exploring this question! 🧵
English
1
30
116
25.1K
Matteo Gallici retweetledi
Matteo Gallici retweetledi
Jakob Foerster
Jakob Foerster@j_foerst·
DQN kick-started the field of deep RL 12 years ago, but Q-learning has recently taken a backseat compared to PPO and other on-policy method. We introduce PQN, a greatly simplified version of DQN which is highly GPU compatible and theoretically supported by convergence proofs.
Matteo Gallici@MatteoGallici

🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxq… A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pur…

English
1
8
82
7.9K
Matteo Gallici retweetledi
Matteo Gallici retweetledi
Boris Belousov
Boris Belousov@_bbelousov·
@MatteoGallici Very interesting work! It sounds very similar to our CrossQ paper x.com/aditya_bhatt/s…
Aditya Bhatt@aditya_bhatt

Introducing CrossQ, just published at #ICLR2024! 🎉 CrossQ achieves: 🔥 Very fast off-policy Deep RL 📈 with SOTA sample-efficiency in <5% of the gradient steps 🧹 without target nets or Q ensembles 🧑🏻‍💻Project and Code: adityab.github.io/CrossQ Joint work w/ @DPalenicek 🧵👇

English
1
4
11
1.4K