RanW

69 posts

RanW

RanW

@_RanW_

Write about machine learning and cognitive science topics at https://t.co/ufxPsqKi2Z

Katılım Mart 2020
149 Takip Edilen111 Takipçiler
RanW retweetledi
Conor Heins
Conor Heins@conorheins·
pymdp 1.0.0 is here: batched, autodifferentiable, JIT-compiled active inference in JAX: github.com/infer-actively… This release brings: GPU/TPU-ready active inference autodiff through inference, planning and learning easy parallelization and batching with vmap()
English
2
24
99
8.5K
RanW
RanW@_RanW_·
@agarwl_ Actually R1 optimality I think
English
0
0
0
29
RanW
RanW@_RanW_·
@agarwl_ Use R2 as reward shaping if multi step? Guaranteed R1 improvement I think
English
1
0
0
543
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
RL twitter: If I'm optimizing two reward functions: true reward R1 and proxy dense reward R2 on the same samples/ trajectories, is there a way to ensure policy improvement in R1 despite using both R1 and R2?
English
18
2
134
19.9K
RanW
RanW@_RanW_·
@Waymo into 2026💨
RanW tweet media
Español
0
0
0
43
RanW
RanW@_RanW_·
@SarahChieng Honestly the other run this morning was a bit too fast 😅 slow is great
English
0
0
1
99
Sarah Chieng
Sarah Chieng@MilksandMatcha·
If you’re still at NeurIPS, a small group of us are running tomorrow Slow, conversational pace Sunday 9:30 AM :)
Sarah Chieng tweet media
English
14
7
170
12.7K
RanW
RanW@_RanW_·
@grx_xce Hi, can’t attend but is there a way to learn more about this?
English
0
0
0
67
Grace Li
Grace Li@grx_xce·
I'm at NeurIPS this Thursday and Friday and hiring! I'd love to chat if you're interested in reward modeling, preference learning, inverse rl with multi-agent systems, and (finally) making HCI commercial Come do cool things with the data at Design Arena :) luma.com/8qf2m6wt
English
10
4
98
15.1K
RanW retweetledi
Harshit Sikchi
Harshit Sikchi@harshit_sikchi·
At @RLBRew_RLC today we are presenting 2 works on unsupervised RL and 1 work on inverse RL. Stop by the poster session to learn more! Details below:
English
1
1
8
580
RanW
RanW@_RanW_·
New post studying the empowerment objective for the assistance game in human-AI collaboration. What is empowerment optimizing? Is it aligned with human preference? What's the ultimate objective for human-AI collaboration? 👇
RanW tweet media
English
1
0
0
121
RanW
RanW@_RanW_·
We also found some useful implementation tricks and tips and observations along the way. These details are documented in this blog (ran-weii.github.io/2025/03/28/cle…).
English
0
0
1
86
RanW
RanW@_RanW_·
CleanIL aims to address this by gathering SOTA algos scattered all over the internet into a single repo. We implemented 7 algos as a starting point. Future plans are outlined in this blog post (latentobservations.substack.com/p/introducing-…) along with interesting use cases of IL and IRL.
RanW tweet mediaRanW tweet mediaRanW tweet media
English
1
0
1
121
RanW
RanW@_RanW_·
Hi imitation learning friends, I am excited to introduce CleanIL (github.com/ran-weii/clean…), a repo of high quality single-file implementations of imitation learning and inverse RL algos inspired by CleanRL and built on @torchrl1.
English
1
5
21
1.9K
TimDarcet
TimDarcet@TimDarcet·
@jon_barron I was going to ask "does it work? :o" but I realize it's super close to gaussian splatting actually, so it should work, going to try! Maybe we should train gaussian splattings with EM
English
1
0
4
578
Lazarz
Lazarz@Laz4rz·
I love this class (Reinforcement Learning), but the material is so dense that most of the stuff we go through is not even in Sutton 💀
Lazarz tweet media
English
22
33
729
81.3K
Richard Ngo
Richard Ngo@RichardMCNgo·
A few weeks ago I decided to carry the Active Inference textbook everywhere I went until I managed to understand it. This has paid off in unexpected ways: at a party tonight I finally found someone who could explain expected free energy minimization to me.
English
28
6
444
59K
RanW
RanW@_RanW_·
Orgs like @FaramaFound have been very helpful with curation and docs, but most widely known/easily found envs are still games and robotics. Although RL for real world is becoming increasingly more popular, finding envs for your domain is still pretty hard.
English
1
0
0
83
RanW
RanW@_RanW_·
RL friends! Do we have a centralized list/hub of RL envs for diverse domains, like @huggingface? Could be useful for ppl want to RL for real world problems (eg, health, science, climate, finance). If not, why? See my small list, leave a comment: tinyurl.com/real-world-rl-…
English
2
0
4
143