Xingdong Zuo

39 posts

Xingdong Zuo

@XingdongZ

Solving real world decision making

Katılım Eylül 2018

99 Takip Edilen82 Takipçiler

Xingdong Zuo@XingdongZ·4 Ara

@yoonholeee @StanfordAILab @chelseabfinn So cool idea! huge congrats Yoonho! I'm pretty curious that how feasible text reward could be helping VLA in robotics. Sometimes there are more we care about than just success rates which could be quantified as a scalar value, what if hill climb on a holistic text reward

English

223

Yoonho Lee@yoonholeee·2 Ara

Following the Text Gradient at Scale We wrote a @StanfordAILab blog post about the limitations of RL methods that learn solely from scalar rewards + a new method that addresses this Blog: ai.stanford.edu/blog/feedback-… Paper: arxiv.org/abs/2511.07919

English

547

178K

Xingdong Zuo@XingdongZ·1 Ara

🚀 PincOpen integrated on LeKiwi for mobile manipulator! Mobile tasks + parallel gripper (symmetric) Huge thanks @huggingface @LeRobotHF, Augustin from @pollenrobotics & LeKiwi team Next: Feetech upgrade + cook VLA 🔥 #robotics #mobilemanipulation #LeKiwi #PincOpen #LeRobot

English

115

Xingdong Zuo@XingdongZ·24 Eki

@perryadong So cool! Congrats for the amazing work @perryadong ! Out of curiosity, what might be the key difference with floq (recent method modeling Q values via flow matching)

English

Perry Dong@perryadong·17 Eki

We propose Value Flows: Learn return vector fields that automatically satisfy the distributional Bellman equation Construct confidence weights using the return distribution to prioritize learning at transitions with higher return variance (3/5)

English

1.9K

Perry Dong@perryadong·17 Eki

The goal of RL is to get high reward, but what is the best way to model rewards? Most methods learn a scalar value, but may miss information in the full return distribution We propose using modern, flexible flow models to predict the full distribution over future rewards (1/5)

English

132

59K

Xingdong Zuo@XingdongZ·15 Eki

🎯 RLBidder bridges research and production scale — from raw auction logs to powerful RL bidders. @ACMRecSys Open-source, modular, research-ready. ⭐ Try it: github.com/zuoxingdong/rl… 💡 Feedback & PRs welcome! #ReinforcementLearning #OfflineRL #AutoBidding #AdsTech #Ads

English

166

Xingdong Zuo@XingdongZ·15 Eki

🚀 Excited to release RLBidder — open-source RL toolkit for auto-bidding. ⚙️ Rust-Polars data pipeline 🧠 IQL·CQL·DT·GAVE·GAS + HL-Gauss 📊 RLiable eval & Plotly dashboards 🌈 Built on @PyTorch + @LightningAI 🔗 github.com/zuoxingdong/rl… #OfflineRL #RL #AutoBidding

English

220

Xingdong Zuo@XingdongZ·15 Eki

@PyTorch @LightningAI 🧩 Modern ML infrastructure @LightningAI training, Draccus configs, @aimstackio tracking, and @plotly dashboards for campaign analysis. RLiable metrics ensure statistically sound benchmarking for RL agents @GoogleDeepMind

English

152

Xingdong Zuo@XingdongZ·15 Eki

@PyTorch @LightningAI 🧠 RL algorithms IQL · CQL · DT · GAVE · GAS · BC + PID/BudgetPacer. HL-Gauss for distributional Q-learning. Ensemble critics (torch.vmap), stable stochastic policies (SigmoidRangeStd, BiasedSoftplus). #OfflineRL #RL

English

Xingdong Zuo@XingdongZ·15 Eki

@PyTorch @LightningAI ⚙️ Rust-powered data pipeline Process massive auction logs efficiently using @DataPolars Lazy API → RL / DT datasets. Reproducible, efficient workflow with scikit-learn-style transformers (Symlog, Winsorizer, ReturnScaledReward). Streaming, scalable, customizable

English

Xingdong Zuo@XingdongZ·9 Eki

@seohong_park @seohong_park I observed that DDPG+BC with learned std (SAC+BC perhaps), Q val can modify logstd too aggressively leading to policy collapse. From some experiments, if I don't want to give up learned std, seems second moment regularization can help #L306" target="_blank" rel="nofollow noopener">github.com/zuoxingdong/rl…

English

Seohong Park@seohong_park·14 Haz

To illustrate this, we ablate the temperatures (α) of AWR and DDPG+BC. We can clearly see that AWR is always policy-bounded (i.e., vertical color gradients), whereas DDPG+BC has two modes. Very interestingly, an in-between value (1.0) in DDPG+BC leads to the best of both worlds!

English

696

Seohong Park@seohong_park·14 Haz

We use three value learning algorithms (IQL, SARSA, Contrastive RL) and three policy extraction algorithms (AWR, DDPG+BC, SfBC) in our analysis.

English

929

Xingdong Zuo@XingdongZ·9 Eki

📈 Interactive visualizations Round-robin agent rotation Tuned classical baselines (BudgetPacer, PID) Interactive Plotly dashboards • Market metrics (HHI, Gini, volatility) • Campaign health, pacing, auction dynamics, score distributions • RLiable metrics

English

Xingdong Zuo@XingdongZ·13 Eyl

@seohong_park @or_rivlin Came across this paper recently Unified Reinforcement Learning through Implicit Value Regularization openreview.net/forum?id=HClw7…

English

Seohong Park@seohong_park·16 Haz

@or_rivlin Right, and I believe it's a great research direction to explore!

English

Seohong Park@seohong_park·14 Haz

Most works in offline RL focus on learning better value functions. So value learning is the main bottleneck in offline RL... right? In our new paper, we show that this is *not* the case in general! Paper: arxiv.org/abs/2406.09329 Blog post: seohong.me/projects/offrl… A thread ↓

English

333

56.7K

Xingdong Zuo retweetledi

Hojoon Lee@hojoon_ai·24 Şub

🚀Introducing SimbaV2: A RL architecture that allows compute and parameter scaling via hyperspherical normalization. Built on Soft Actor Critic, using SimbaV2 architecture achieves SOTA results on Mujoco, DMC, Myosuite, HumanoidBench! Project page: dojeon-ai.github.io/SimbaV2/ ⬇️

English

6.3K

Xingdong Zuo@XingdongZ·9 Tem

@DominiqueCAPaul @huggingface We are currently working on a PR related to video batch encoding github.com/huggingface/le…

English

154

Dominique Paul@DominiqueCAPaul·8 Tem

Today and tomorrow are data collection power days: 465 today with a goal of 1100 new samples total. The @huggingface recording script previously took 30-40s to process each 15s recording, but I changed it to batch process and now collection is way faster (and more fun)!

English

488

39.3K

Xingdong Zuo@XingdongZ·17 Haz

@jisu_han__ @philfung @huggingface @LeRobotHF Thanks so much Jisu!! Definitely would love to collaborate together next time Let's keep in touch

English

jshan@jisu_han__·17 Haz

@XingdongZ @philfung @huggingface @LeRobotHF Thanks, Xingdong! Our team was busy building everything from scratch, so we didn’t get much time to chat. I was really impressed how your team trained on datasets for language practicality. Would love to team up with you sometime, maybe at another hackathon or other chance! 😊

English

jshan@jisu_han__·15 Haz

Just joined the LeRobot Hackathon Seoul district— and honestly, there’s something about building things with others who are genuinely curious and kind 🤖✨ Our team made a simple robot that can catch mosquitoes (world-wide enemy🦟) How about your team? @huggingface @LeRobotHF

English

182

57.1K

Xingdong Zuo@XingdongZ·17 Haz

@sudosimi @pollenrobotics @LeRobotHF So coooool! Did you manage to get a wrist cam on PincOpen?

English

Simi@sudosimi·24 May

Making two PincOpen grippers for my SO100 arms. Hope to put them to good use at London’s LeRobot Hackathon. @pollenrobotics @LeRobotHF

GIF

English

4.1K

Xingdong Zuo@XingdongZ·17 Haz

@AdilZtn @philfung Couldn't be more excited!

English

Xingdong Zuo retweetledi

Adil D. Ztn 👒@AdilZtn·13 Haz

next step is to fine-tune SmolVla with HIL-SERL

Adil D. Ztn 👒@AdilZtn

🎁 We have a surprise for @LeRobotHF worldwide hackathon, with @AractingiMichel we released HIL-SERL on LeRobot! 🚀 > Distributed RL Training > Human in the loop Here’s a slow-mo video of our ‘Zidane-like’ policy, trained with HIL-SERL in just 10 minutes of training! ⚽️

English

2.4K

Xingdong Zuo@XingdongZ·12 Haz

🤖 2 days to #LeRobotHackathon! Tired of shaky robot arms? I built LeRobokinson to stabilize them—no more endless PD tuning! 🎮 Live GUI tuning 🧠 Auto optimization 📊 IAE/ISE, damping, settling time 📈 Beautiful plots Open source: github.com/zuoxingdong/le… @RemiCadene

English

9.9K

Xingdong Zuo retweetledi

Andrej Karpathy@karpathy·7 Haz

My sleep scores during recent travel were in the 90s. Now back in SF I am consistently back down to 70s, 80s. I am increasingly convinced that this is due to traffic noise from a nearby road/intersection where I live - every ~10min, a car, truck, bus, or motorcycle with a very loud engine passes by (some are 10X louder than others). In the later less deep stages of sleep, it is much easier to wake and then much harder to go back to sleep. More generally I think noise pollution (esp early hours) come at a huge societal cost that is not correctly accounted for. E.g. I wouldn't be too surprised if a single motorcycle riding through a neighborhood at 6am creates millions of dollars in damages in the form of hundreds - thousands of people who are more groggy, more moody, less creative, less energetic for the whole day, and more sick in the long term (cardiovascular, metabolic, cognitive). And I think that many people, like me, might not be aware that this happening for a long time because 1) they don't measure their sleep carefully, and 2) your brain isn't fully conscious when waking and isn't able to make a lasting note / association in that state. I really wish future versions of Whoop (or Oura or etc.) would explicitly track and correlate noise to sleep, and raise this to the population. It's not just traffic, e.g. in SF, as a I recently found out, it is ok by law to begin arbitrarily loud road work or construction starting 7am. Same for leaf blowers and a number of other ways of getting up to 100dB. I ran a few Deep Research sessions and a number of studies that have tried to isolate noise and show depressing outcomes for cohorts of people who sleep in noisy environments, with increased risk across all of mental health (e.g. depression, bipolar disorders, Alzheimer's incidence) but also a lot more broadly, e.g. cardiovascular disease, diabetes. Anyway, it took me a while to notice and after (unsuccessfully) trying a number of mitigations I am moving somewhere quiet. But from what I've seen this is a major public health issue with little awareness and with incorrect accounting by the government.

English

1.1K

758

12.1K

1.4M

Keşfet

@yoonholeee @StanfordAILab @chelseabfinn @huggingface @LeRobotHF @pollenrobotics @perryadong @ACMRecSys