RanW

69 posts

RanW

@_RanW_

Write about machine learning and cognitive science topics at https://t.co/ufxPsqKi2Z

Katılım Mart 2020

149 Takip Edilen111 Takipçiler

RanW retweetledi

Conor Heins@conorheins·23 Mar

pymdp 1.0.0 is here: batched, autodifferentiable, JIT-compiled active inference in JAX: github.com/infer-actively… This release brings: GPU/TPU-ready active inference autodiff through inference, planning and learning easy parallelization and batching with vmap()

English

8.5K

RanW@_RanW_·5 Şub

@agarwl_ Actually R1 optimality I think

English

RanW@_RanW_·5 Şub

@agarwl_ Use R2 as reward shaping if multi step? Guaranteed R1 improvement I think

English

543

Rishabh Agarwal@agarwl_·5 Şub

RL twitter: If I'm optimizing two reward functions: true reward R1 and proxy dense reward R2 on the same samples/ trajectories, is there a way to ensure policy improvement in R1 despite using both R1 and R2?

English

134

19.9K

RanW@_RanW_·1 Oca

@Waymo into 2026💨

Español

RanW@_RanW_·7 Ara

@SarahChieng Honestly the other run this morning was a bit too fast 😅 slow is great

English

Sarah Chieng@MilksandMatcha·7 Ara

If you’re still at NeurIPS, a small group of us are running tomorrow Slow, conversational pace Sunday 9:30 AM :)

English

170

12.7K

RanW@_RanW_·3 Ara

@grx_xce Hi, can’t attend but is there a way to learn more about this?

English

Grace Li@grx_xce·2 Ara

I'm at NeurIPS this Thursday and Friday and hiring! I'd love to chat if you're interested in reward modeling, preference learning, inverse rl with multi-agent systems, and (finally) making HCI commercial Come do cool things with the data at Design Arena :) luma.com/8qf2m6wt

English

15.1K

RanW@_RanW_·30 Eki

Sometimes it’s very useful to write down the Bayes net/factor graph/causal DAG of your env. Helped me quite a lot e.g. studying Alchemy: ran-weii.github.io/2024/10/05/sim…

Pablo Samuel Castro@pcastr

🚨The Formalism-Implementation Gap in RL research🚨 Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE). 1⃣ Let's advance science of RL 2⃣ Let's be explicit about how benchmarks map to formalism 1/X

English

246

RanW retweetledi

Harshit Sikchi@harshit_sikchi·5 Ağu

At @RLBRew_RLC today we are presenting 2 works on unsupervised RL and 1 work on inverse RL. Stop by the poster session to learn more! Details below:

English

580

RanW retweetledi

Alec Tschantz@a_tschantz·25 Haz

New paper from @VERSESAI - AXIOM is a world model that learns to play pixel-based arcade games in minutes. Preprint: arxiv.org/abs/2505.24784 Blog: tinyurl.com/yrvj3cay Code: github.com/VersesTech/axi… 🧵

English

10.6K

RanW@_RanW_·14 Nis

latentobservations.substack.com/p/empowerment-…

ZXX

RanW@_RanW_·14 Nis

New post studying the empowerment objective for the assistance game in human-AI collaboration. What is empowerment optimizing? Is it aligned with human preference? What's the ultimate objective for human-AI collaboration? 👇

English

121

RanW@_RanW_·31 Mar

We also found some useful implementation tricks and tips and observations along the way. These details are documented in this blog (ran-weii.github.io/2025/03/28/cle…).

English

RanW@_RanW_·31 Mar

CleanIL aims to address this by gathering SOTA algos scattered all over the internet into a single repo. We implemented 7 algos as a starting point. Future plans are outlined in this blog post (latentobservations.substack.com/p/introducing-…) along with interesting use cases of IL and IRL.

English

121

RanW@_RanW_·31 Mar

Hi imitation learning friends, I am excited to introduce CleanIL (github.com/ran-weii/clean…), a repo of high quality single-file implementations of imitation learning and inverse RL algos inspired by CleanRL and built on @torchrl1.

English

1.9K

RanW@_RanW_·12 Mar

@TimDarcet @jon_barron It’s pretty much mixture density network kaggle.com/code/runway/ga… And ppl have tried EM Gaussian splatting

English

106

TimDarcet@TimDarcet·12 Mar

@jon_barron I was going to ask "does it work? :o" but I realize it's super close to gaussian splatting actually, so it should work, going to try! Maybe we should train gaussian splattings with EM

English

578

TimDarcet@TimDarcet·11 Mar

lfg it's fitting

TimDarcet@TimDarcet

Damn expectation-maximisation of a GMM got hands (it's the easiest algo in stat learning im just bad)

English

29.5K

RanW@_RanW_·7 Mar

@Laz4rz Not as complicated as your lecture but hopefully help with intuition: latentobservations.substack.com/p/a-tutorial-o…

English

233

Lazarz@Laz4rz·6 Mar

I love this class (Reinforcement Learning), but the material is so dense that most of the stuff we go through is not even in Sutton 💀

English

729

81.3K

RanW@_RanW_·23 Şub

@RichardMCNgo If you still have some lingering doubts, these might be helpful: openreview.net/forum?id=ZyQvV… latentobservations.substack.com/p/making-sense…

English

Richard Ngo@RichardMCNgo·23 Şub

A few weeks ago I decided to carry the Active Inference textbook everywhere I went until I managed to understand it. This has paid off in unexpected ways: at a party tonight I finally found someone who could explain expected free energy minimization to me.

English

444

59K

RanW@_RanW_·14 Şub

Something cool would be similar to github.com/karpathy/arxiv… where one can search envs by domain and keywords

English

RanW@_RanW_·14 Şub

Orgs like @FaramaFound have been very helpful with curation and docs, but most widely known/easily found envs are still games and robotics. Although RL for real world is becoming increasingly more popular, finding envs for your domain is still pretty hard.

English

RanW@_RanW_·14 Şub

RL friends! Do we have a centralized list/hub of RL envs for diverse domains, like @huggingface? Could be useful for ppl want to RL for real world problems (eg, health, science, climate, finance). If not, why? See my small list, leave a comment: tinyurl.com/real-world-rl-…

English

143

Keşfet

@agarwl_ @Waymo @sarahchieng @grx_xce @RLBRew_RLC @VERSESAI @torchrl1 @TimDarcet