Joseph Suarez 🐡

9.3K posts

Joseph Suarez 🐡

@jsuarez

I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.

Katılım Mart 2019

122 Takip Edilen29.2K Takipçiler

Sabitlenmiş Tweet

Joseph Suarez 🐡@jsuarez·6 Nis

Releasing PufferLib 4.0: Train agents in seconds

English

1.1K

183.2K

Joseph Suarez 🐡@jsuarez·12h

Reinforcement learning research with Joseph Suarez x.com/i/broadcasts/1…

English

915

Joseph Suarez 🐡@jsuarez·13h

Reinforcement learning research with Joseph Suarez x.com/i/broadcasts/1…

English

Joseph Suarez 🐡@jsuarez·13h

If you're seeing tons of notifications for my streams: internet here has been blipping out multiple times per hour for 1-5 seconds. @X doesn't allow you to reconnect and instead launches a new stream + post

English

1.2K

Joseph Suarez 🐡@jsuarez·14h

Reinforcement learning research with Joseph Suarez x.com/i/broadcasts/1…

English

Joseph Suarez 🐡@jsuarez·14h

Reinforcement learning research with Joseph Suarez x.com/i/broadcasts/1…

English

848

Joseph Suarez 🐡@jsuarez·15h

@vikhyatk it doesn't work. hparam tuning algorithms do work though

English

1.2K

vik@vikhyatk·15h

too much time is being spent making optimizers marginally faster. what we really need is hparam-free optimizers

English

111

9.4K

Joseph Suarez 🐡@jsuarez·15h

Reinforcement learning research with Joseph Suarez x.com/i/broadcasts/1…

English

971

Joseph Suarez 🐡@jsuarez·15h

Reinforcement learning research with Joseph Suarez x.com/i/broadcasts/1…

English

899

Joseph Suarez 🐡@jsuarez·16h

Reinforcement learning research with Joseph Suarez x.com/i/broadcasts/1…

English

986

Joseph Suarez 🐡@jsuarez·17h

Reinforcement learning research with Joseph Suarez x.com/i/broadcasts/1…

English

1.1K

Joseph Suarez 🐡@jsuarez·17h

@j_foerst No longer! PufferLib is ground-up CUDA as of 4.0

English

1.1K

Jakob Foerster@j_foerst·1d

RL has largely been a consumer of a deep learning toolkit that was developed for supervised learning. In our recent work we explore RL specific hierarchical state representations that allow agents to overcome issues with low quality demonstration data.

Clarisse Wibault@ClarisseWibault

CV has CNNs, NLP has transformers - what inductive bias does RL have? How can policies generalise to regions of the dataset suffering from poor transitions? We motivate hierarchy by enabling distinct state-representations at different levels of the hierarchy @FLAIR_Ox @j_foerst

English

12.9K

Joseph Suarez 🐡@jsuarez·2d

@yacineMTB + 5.0 will be a short update - coming soon!

English

1.9K

kache@yacineMTB·2d

Pufferlib is insane. You can train neural networks to play games out of the box if you have a CUDA GPU. Like breakout, Atari games, continuous action space problems. You can go to the website right now and they have neural nets running in wasm

English

534

33.7K

Joseph Suarez 🐡@jsuarez·3d

@DanAdvantage yeah you deserved that flame

English

956

Dan Advantage@DanAdvantage·3d

yesterday i told gpt-5.5 exxxxxtra thinking fast in codex to use up all my monthly subscription to make the latest pufferlib sota breakthrough even better, and it silently determined that the problem was simply "too hard" the actual pr is below notice the maze is not a maze...

English

4.4K

Joseph Suarez 🐡@jsuarez·3d

Core optimization improvements to PufferLib today: - MinGRU h x 3h projection layer -> orthogonalize the 3 slices separately in Muon - Replace NS with Polar Express - mup scaling makes it easier on our sweeps to tune learning rate jointly with model size - Aurora update on rectangular matrices (note MinGRU is square after splitting it into slices)

English

5.3K

Joseph Suarez 🐡@jsuarez·3d

@BovardDT There are a few folks in the puffer discord doing the competition! They have likely already ported the env to C and will be training for several hundred years in simulation. It's much more beginner friendly than you would expect, but not exactly notebook material

English

Bovard DT@BovardDT·3d

@jsuarez lots of people on the forums asking how to get started with an RL solution. A puffer.ai starter notebook would be very well received I suspect!

English

Bovard DT@BovardDT·3d

Orbit Wars just hit 3k teams! (and the self-reported RL team just took the lead). Still over a month left for folks who want to jump in! kaggle.com/competitions/o…

GIF

English

445

Joseph Suarez 🐡@jsuarez·3d

After a few hours of working on this, I have been unable to get it to work in PufferLib for RL. It is far less stable and far more brittle than simple cosine decay, even across different timestep budgets

English

2.1K

Joseph Suarez 🐡@jsuarez·3d

Sold. This guy rocks