Xingdong Zuo

39 posts

Xingdong Zuo

Xingdong Zuo

@XingdongZ

Solving real world decision making

Katılım Eylül 2018
99 Takip Edilen82 Takipçiler
Xingdong Zuo
Xingdong Zuo@XingdongZ·
@yoonholeee @StanfordAILab @chelseabfinn So cool idea! huge congrats Yoonho! I'm pretty curious that how feasible text reward could be helping VLA in robotics. Sometimes there are more we care about than just success rates which could be quantified as a scalar value, what if hill climb on a holistic text reward
English
1
0
1
223
Xingdong Zuo
Xingdong Zuo@XingdongZ·
@perryadong So cool! Congrats for the amazing work @perryadong ! Out of curiosity, what might be the key difference with floq (recent method modeling Q values via flow matching)
English
0
0
0
25
Perry Dong
Perry Dong@perryadong·
We propose Value Flows: Learn return vector fields that automatically satisfy the distributional Bellman equation Construct confidence weights using the return distribution to prioritize learning at transitions with higher return variance (3/5)
Perry Dong tweet media
English
2
1
14
1.9K
Perry Dong
Perry Dong@perryadong·
The goal of RL is to get high reward, but what is the best way to model rewards? Most methods learn a scalar value, but may miss information in the full return distribution We propose using modern, flexible flow models to predict the full distribution over future rewards (1/5)
English
1
13
132
59K
Xingdong Zuo
Xingdong Zuo@XingdongZ·
@PyTorch @LightningAI 🧠 RL algorithms IQL · CQL · DT · GAVE · GAS · BC + PID/BudgetPacer. HL-Gauss for distributional Q-learning. Ensemble critics (torch.vmap), stable stochastic policies (SigmoidRangeStd, BiasedSoftplus). #OfflineRL #RL
English
0
0
1
76
Xingdong Zuo
Xingdong Zuo@XingdongZ·
@PyTorch @LightningAI ⚙️ Rust-powered data pipeline Process massive auction logs efficiently using @DataPolars Lazy API → RL / DT datasets. Reproducible, efficient workflow with scikit-learn-style transformers (Symlog, Winsorizer, ReturnScaledReward). Streaming, scalable, customizable
English
0
0
1
58
Xingdong Zuo
Xingdong Zuo@XingdongZ·
@seohong_park @seohong_park I observed that DDPG+BC with learned std (SAC+BC perhaps), Q val can modify logstd too aggressively leading to policy collapse. From some experiments, if I don't want to give up learned std, seems second moment regularization can help #L306" target="_blank" rel="nofollow noopener">github.com/zuoxingdong/rl…
English
0
0
1
24
Seohong Park
Seohong Park@seohong_park·
To illustrate this, we ablate the temperatures (α) of AWR and DDPG+BC. We can clearly see that AWR is always policy-bounded (i.e., vertical color gradients), whereas DDPG+BC has two modes. Very interestingly, an in-between value (1.0) in DDPG+BC leads to the best of both worlds!
Seohong Park tweet media
English
2
0
3
696
Seohong Park
Seohong Park@seohong_park·
We use three value learning algorithms (IQL, SARSA, Contrastive RL) and three policy extraction algorithms (AWR, DDPG+BC, SfBC) in our analysis.
Seohong Park tweet media
English
1
0
2
929
Xingdong Zuo
Xingdong Zuo@XingdongZ·
📈 Interactive visualizations Round-robin agent rotation Tuned classical baselines (BudgetPacer, PID) Interactive Plotly dashboards   • Market metrics (HHI, Gini, volatility)   • Campaign health, pacing, auction dynamics, score distributions • RLiable metrics
Xingdong Zuo tweet mediaXingdong Zuo tweet media
English
0
0
0
57
Seohong Park
Seohong Park@seohong_park·
@or_rivlin Right, and I believe it's a great research direction to explore!
English
1
0
1
56
Seohong Park
Seohong Park@seohong_park·
Most works in offline RL focus on learning better value functions. So value learning is the main bottleneck in offline RL... right? In our new paper, we show that this is *not* the case in general! Paper: arxiv.org/abs/2406.09329 Blog post: seohong.me/projects/offrl… A thread ↓
English
6
53
333
56.7K
Xingdong Zuo retweetledi
Hojoon Lee
Hojoon Lee@hojoon_ai·
🚀Introducing SimbaV2: A RL architecture that allows compute and parameter scaling via hyperspherical normalization. Built on Soft Actor Critic, using SimbaV2 architecture achieves SOTA results on Mujoco, DMC, Myosuite, HumanoidBench! Project page: dojeon-ai.github.io/SimbaV2/ ⬇️
Hojoon Lee tweet media
English
3
14
71
6.3K
Dominique Paul
Dominique Paul@DominiqueCAPaul·
Today and tomorrow are data collection power days: 465 today with a goal of 1100 new samples total. The @huggingface recording script previously took 30-40s to process each 15s recording, but I changed it to batch process and now collection is way faster (and more fun)!
English
29
27
488
39.3K
jshan
jshan@jisu_han__·
@XingdongZ @philfung @huggingface @LeRobotHF Thanks, Xingdong! Our team was busy building everything from scratch, so we didn’t get much time to chat. I was really impressed how your team trained on datasets for language practicality. Would love to team up with you sometime, maybe at another hackathon or other chance! 😊
English
1
0
1
78
jshan
jshan@jisu_han__·
Just joined the LeRobot Hackathon Seoul district— and honestly, there’s something about building things with others who are genuinely curious and kind 🤖✨ Our team made a simple robot that can catch mosquitoes (world-wide enemy🦟) How about your team? @huggingface @LeRobotHF
English
10
28
182
57.1K
Simi
Simi@sudosimi·
Making two PincOpen grippers for my SO100 arms. Hope to put them to good use at London’s LeRobot Hackathon. @pollenrobotics @LeRobotHF
GIF
English
3
3
11
4.1K
Xingdong Zuo retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
My sleep scores during recent travel were in the 90s. Now back in SF I am consistently back down to 70s, 80s. I am increasingly convinced that this is due to traffic noise from a nearby road/intersection where I live - every ~10min, a car, truck, bus, or motorcycle with a very loud engine passes by (some are 10X louder than others). In the later less deep stages of sleep, it is much easier to wake and then much harder to go back to sleep. More generally I think noise pollution (esp early hours) come at a huge societal cost that is not correctly accounted for. E.g. I wouldn't be too surprised if a single motorcycle riding through a neighborhood at 6am creates millions of dollars in damages in the form of hundreds - thousands of people who are more groggy, more moody, less creative, less energetic for the whole day, and more sick in the long term (cardiovascular, metabolic, cognitive). And I think that many people, like me, might not be aware that this happening for a long time because 1) they don't measure their sleep carefully, and 2) your brain isn't fully conscious when waking and isn't able to make a lasting note / association in that state. I really wish future versions of Whoop (or Oura or etc.) would explicitly track and correlate noise to sleep, and raise this to the population. It's not just traffic, e.g. in SF, as a I recently found out, it is ok by law to begin arbitrarily loud road work or construction starting 7am. Same for leaf blowers and a number of other ways of getting up to 100dB. I ran a few Deep Research sessions and a number of studies that have tried to isolate noise and show depressing outcomes for cohorts of people who sleep in noisy environments, with increased risk across all of mental health (e.g. depression, bipolar disorders, Alzheimer's incidence) but also a lot more broadly, e.g. cardiovascular disease, diabetes. Anyway, it took me a while to notice and after (unsuccessfully) trying a number of mitigations I am moving somewhere quiet. But from what I've seen this is a major public health issue with little awareness and with incorrect accounting by the government.
English
1.1K
758
12.1K
1.4M