Joseph Suarez 🐡

9.3K posts

Joseph Suarez 🐡 banner
Joseph Suarez 🐡

Joseph Suarez 🐡

@jsuarez

I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.

Katılım Mart 2019
122 Takip Edilen29.2K Takipçiler
Sabitlenmiş Tweet
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
Releasing PufferLib 4.0: Train agents in seconds
English
39
94
1.1K
183.2K
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
If you're seeing tons of notifications for my streams: internet here has been blipping out multiple times per hour for 1-5 seconds. @X doesn't allow you to reconnect and instead launches a new stream + post
English
0
0
7
1.1K
vik
vik@vikhyatk·
too much time is being spent making optimizers marginally faster. what we really need is hparam-free optimizers
English
9
0
93
7.7K
Jakob Foerster
Jakob Foerster@j_foerst·
RL has largely been a consumer of a deep learning toolkit that was developed for supervised learning. In our recent work we explore RL specific hierarchical state representations that allow agents to overcome issues with low quality demonstration data.
Clarisse Wibault@ClarisseWibault

CV has CNNs, NLP has transformers - what inductive bias does RL have? How can policies generalise to regions of the dataset suffering from poor transitions? We motivate hierarchy by enabling distinct state-representations at different levels of the hierarchy @FLAIR_Ox @j_foerst

English
2
6
70
11.6K
kache
kache@yacineMTB·
Pufferlib is insane. You can train neural networks to play games out of the box if you have a CUDA GPU. Like breakout, Atari games, continuous action space problems. You can go to the website right now and they have neural nets running in wasm
English
23
10
535
33.3K
Dan Advantage
Dan Advantage@DanAdvantage·
yesterday i told gpt-5.5 exxxxxtra thinking fast in codex to use up all my monthly subscription to make the latest pufferlib sota breakthrough even better, and it silently determined that the problem was simply "too hard" the actual pr is below notice the maze is not a maze...
Dan Advantage tweet mediaDan Advantage tweet mediaDan Advantage tweet media
English
4
0
11
4.4K
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
Core optimization improvements to PufferLib today: - MinGRU h x 3h projection layer -> orthogonalize the 3 slices separately in Muon - Replace NS with Polar Express - mup scaling makes it easier on our sweeps to tune learning rate jointly with model size - Aurora update on rectangular matrices (note MinGRU is square after splitting it into slices)
English
4
6
65
5.3K
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
@BovardDT There are a few folks in the puffer discord doing the competition! They have likely already ported the env to C and will be training for several hundred years in simulation. It's much more beginner friendly than you would expect, but not exactly notebook material
English
0
0
0
38
Bovard DT
Bovard DT@BovardDT·
@jsuarez lots of people on the forums asking how to get started with an RL solution. A puffer.ai starter notebook would be very well received I suspect!
English
1
0
0
39
Bovard DT
Bovard DT@BovardDT·
Orbit Wars just hit 3k teams! (and the self-reported RL team just took the lead). Still over a month left for folks who want to jump in! kaggle.com/competitions/o…
GIF
English
1
2
2
444
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
After a few hours of working on this, I have been unable to get it to work in PufferLib for RL. It is far less stable and far more brittle than simple cosine decay, even across different timestep budgets
English
1
2
36
2.1K