Ivan Masmitja
347 posts

Ivan Masmitja
@imasmitja
Researcher | Institut de Ciències del Mar (ICM-CSIC) @ICMCSIC |#AutonomousUnderwaterVehicles #TargetTracking #AIforUTracking





PQN has been accepted as a Spotlight at ICRL 2025 🎉 👉If didn't have time to check it out, take a look at our 5-minute blog covering PQN’s key features, plus a Colab demo with JAX & PyTorch implmentations mttga.github.io/posts/pqn/ 🔎 If you want to dive deeper, we break down TD instability issues and how PQN tackles them without replay buffers or target networks in this other blog post: blog.foersterlab.com/fixing-td-part… See you in Singapore! 🇸🇬



@creus_roger just implemented a @cleanrl_lib Parallel Q-Networks algorithm (PQN) implementation! 🚀PQN is DQN without a replay buffer and target network. You can run PQN on GPU environments or vectorized environments. E.g., in envpool, PQN gets DQN's score in 1/10th the time







⚪I OPEN CALL 🪼¿Postdocs interesadxs cómo cambia la distribución de la biodiversidad debido al cambio global? 🤖¿Predocs amantes de la robótica submarina? 🚨En el ICM ofrecemos distintas plazas dentro del programa @Momentum_CSIC! + info aquí ➡️ icm.csic.es/ca/ofertes-de-…




Invertimos 14,5 millones € en ayudas para 43 proyectos del Programa Pleamar 🌊 → A través de @FBiodiversidad 🔵 Impulso a la economía azul 🔵 Sostenibilidad del sector pesquero y acuícola 🔵 Refuerzo de la protección del medio marino +info t.ly/extLa






🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX! What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on replay buffers and target networks, making training computationally slow and memory-intensive on GPUs. As a result, researchers often prefer PPO, leaving Q-Learning behind 🥲 Our solution? Eliminate Q-Learning's legacy components. PQN challenges the standard DQN paradigm by training a Q-Network without replay buffers or target networks: just online Q-Learning with vectorised exploration and network normalization (layer or batch norm). Despite its simplicity, PQN sets a new strong baseline in many single-agent and multi-agent scenarios. Check out the thread for more details 🔥 📄 Paper: arxiv.org/abs/2407.04811 ⚙️ Code: github.com/mttga/purejaxq… A fantastic collaboration with @mattiefoxcs and @benjamin_ellis3, and the support of the amazing people at @FLAIR_Ox directed by @j_foerst. Inspired by the groundbreaking work of @_chris_lu_ on compiling entire RL pipelines in GPU: github.com/luchris429/pur…









