Ching-An Cheng

124 posts

Ching-An Cheng

@chinganc_rl

Senior Research Scientist at @Google Research, working on usable theory and algorithms for Reinforcement Learning, Generative Optimization, and Robotics

Redmond, WA Katılım Mart 2020

101 Takip Edilen1.9K Takipçiler

Sabitlenmiş Tweet

Ching-An Cheng@chinganc_rl·19 Mar

LLM has been struggling to solve search and optimization at scale when feedback is stochastic. We propose a simple solution, POLCA, using text embedding with “provable” guarantee. Excited to see the first theoretically correct work of LLM optimization. Kudos to @XuanfeiRen

Xuanfei Ren@XuanfeiRen

🚀 How can we make LLM-based optimization stable and scalable when the feedback signal is stochastic? Introducing POLCA: a framework for robust, scalable stochastic generative optimization. Paper: arxiv.org/abs/2603.14769 Code: github.com/rlx-lab/POLCA 🧵👇 1/

English

7.4K

Ching-An Cheng retweetledi

Nan Jiang@nanjiang_cs·2d

I have served as AC for NeurIPS every year since 2020. Just declined (with messages adapted from @xuanalogue). At least the organizers owe the community an explanation why they are the only major ML venue adopting such a policy.

xuan (ɕɥɛn / sh-yen)@xuanalogue

I have declined volunteer review for @NeurIPSConf in light of their policy on sanctioned institutions. If you feel similarly opposed, feel free to adapt my message.

English

629

83.9K

Ching-An Cheng retweetledi

Dan Roy@roydanroy·15 Eyl

Too close to home? Junior researcher: I’m publishing papers at NeurIPS, my students are happy, but my chair says I’m “not impactful enough.” I don’t know what that means. Senior researcher: What did you tell them you accomplished last year? Junior: 3 top-tier papers, a new theoretical result on regret bounds, and an invited talk. Senior: And what did they hear? Junior: That I published 3 papers? Senior: They heard “I added to the publication count, but didn’t bring in grants or visibility for the department.” Junior: But regret bounds are impactful! Senior: To who? Junior: To… theorists? Senior: Your chair spends 20 minutes a month justifying your position to the dean. Can they use regret bounds to argue for funding? Junior: …probably not. Senior: What external metrics did your work move? Junior: One collaboration, one best paper award, and some citations. We don’t really track grant impact. Senior: There’s the problem. Half your contributions are invisible by design. Junior: But theory is necessary. The field would break without it. Senior: I believe you. The dean doesn’t care. Junior: That seems unfair. Senior: It is unfair. It’s also how academia works. Chairs get grilled on grants, rankings, and prestige, not the long-run stability of ML theory. Junior: So what should I do? Senior: Reframe. “Secured $500K in funding to explore foundational algorithms” sounds better than “proved a tighter regret bound.” Junior: But I don’t have that funding. Senior: Then you’re fighting academic reality without weapons. Junior: I don’t have time to write grants and still publish. Senior: Most junior faculty don’t. That’s the trap — you get judged on impact but don’t get impact resources. Junior: So what do I do? Senior: Acknowledge the game is rigged, then play it anyway. Junior: Meaning? Senior: Build collaborations that attract funding. Tie your theory to hot applied areas. Translate your results into language deans understand. Junior: That feels political. Senior: Everything above a certain level is political. The choice isn’t political vs pure. It’s visible vs irrelevant. Junior: What if my chair still doesn’t care? Senior: Then you’ve learned your chair doesn’t know how to evaluate theory. That’s a different problem — one you solve by finding a better environment. Junior: This is harder than just proving good theorems. Senior: Proving good theorems is table stakes. Surviving academia while proving good theorems — that’s the actual job.

George from 🕹prodmgmt.world@nurijanian

Junior PM: I'm shipping everything on time, team loves me, but my manager says I'm "not strategic enough." I'm exhausted trying to figure out what that means. Senior PM: What did you tell him you accomplished last quarter? Junior PM: Delivered 5 features, reduced tech debt, improved team velocity by 15%. Senior PM: And what did he hear? Junior PM: That I delivered 5 features? Senior PM: He heard "I kept the team busy with stuff that doesn't move numbers I get asked about." Junior PM: But velocity improvement is strategic. Senior PM: To who? Junior PM: To... the team? Senior PM: Your manager spends 20 minutes a week with his director explaining why you exist. Can he use velocity to justify your headcount? Junior PM: I... probably not. Senior PM: What business metrics did those 5 features move? Junior PM: Three were tech debt, one was a sales request, one was compliance. We don't really measure impact on that stuff. Senior PM: There's your problem. Half your work is invisible by design. Junior PM: But that work was necessary. The platform would break without it. Senior PM: I believe you. Your manager's director doesn't care. Junior PM: That seems unfair. Senior PM: It is unfair. It's also how companies work. Your manager gets grilled about revenue and retention, not platform stability. Junior PM: So I should have said no to the tech debt? Senior PM: You probably couldn't. But you should have framed it differently. Junior PM: How? Senior PM: "Prevented $200K in potential downtime costs" sounds better than "reduced tech debt." Junior PM: But I don't have that number. Senior PM: Then you're fighting organizational reality without weapons. Junior PM: I don't have analytics support or time to instrument everything. Senior PM: Most junior PMs don't. That's the trap - you get judged on business impact but don't get business resources. Junior PM: So what do I do? Senior PM: Acknowledge the game is rigged, then play it anyway. Junior PM: Meaning? Senior PM: Make allies in sales and marketing. They have the numbers you need. Shadow customer calls. Connect your work to their goals. Junior PM: That feels political. Senior PM: Everything above a certain level is political. The choice isn't political vs pure. It's visible vs irrelevant. Junior PM: What if I try this and my manager still doesn't care? Senior PM: Then you learn your manager doesn't know how to evaluate PM work. That's a different problem - one you solve by finding a better manager. Junior PM: This is harder than just building good products. Senior PM: Building good products is table stakes. Surviving organizational dysfunction while building good products - that's the actual job.

English

1.1K

160.6K

Ching-An Cheng@chinganc_rl·5 Ağu

Happening now at #RLC2025. Join us if you’re interested in program, agents and RL.

Shao-Hua Sun@shaohua0116

Kicking off #RLC2025 with our Workshop on Programmatic Reinforcement Learning! This workshop explores how programmatic representations can improve interpretability, generalization, efficiency, and safety in RL.

English

1.5K

Ching-An Cheng retweetledi

Brando Miranda@BrandoHablando·18 Tem

🔄 We were nominated for Oral+top 1 in the MATH-AI workshp at #ICML! 🚨Why? ≈46 % of GitHub commits are AI-generated—but can we verify them correct? 📢 VeriBench challenges agents; turn Python into Lean code! 🧵1/14 📃 Paper: #discussion" target="_blank" rel="nofollow noopener">openreview.net/forum?id=rWkGF…

English

4.4K

Ching-An Cheng@chinganc_rl·18 Tem

We are organizing a workshop tomorrow at #icml25. Come join us and checkout the latest on programmatic representation and agent learning

Shao-Hua Sun@shaohua0116

Our #ICML2025 Programmatic Representations for Agent Learning workshop will take place tomorrow, July 18th, at the West Meeting Room 301-305, exploring how programmatic representations can make agent learning more interpretable, generalizable, efficient, and safe! Come join us!

English

2.4K

Ching-An Cheng retweetledi

Shao-Hua Sun@shaohua0116·18 Tem

English

35.4K

Ching-An Cheng@chinganc_rl·16 Tem

Starting my #ICML2025. Will be here until Saturday. Looking forward to meeting everyone 😀

English

920

Ching-An Cheng retweetledi

Allen Nie (🇺🇦☮️)@allenainie·15 Tem

Provably Learning from Language Feedback TLDR: RL theory can help us do better inference-time exploration with feedback. Work done with @wanqiao_xu, @ruijie_zheng12, @chinganc_rl, @adityamodi94, @adith387 📰 arxiv.org/pdf/2506.10341 📍EXAIT Best Paper/Oral Sat 8:45-9:30 am

English

3.8K

Ching-An Cheng@chinganc_rl·19 Haz

Super excited about this work done by our former intern @wanqiao_xu . We show Learning from Language Feedback (LLF) with LLM can be formally studied with provable no-regret learning algorithms. This result builds a foundation toward new theories for LLM learning and optimization.

Allen Nie (🇺🇦☮️)@allenainie

Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a person) efficiently? How can we understand the difficulty? We propose a new notion of learning complexity to study learning with language feedback only. 🧵👇

English

1.8K

Ching-An Cheng retweetledi

Allen Nie (🇺🇦☮️)@allenainie·17 Haz

English

103

17.9K

Ching-An Cheng@chinganc_rl·28 May

Check out this new optimization framework (github.com/datarobot/syftr) by #DataRobot that can automatically search for "Pareto-optimal" solutions for agentic workflows. It's built on our LLM generative optimization framework #Trace. Excited to see more applications of #Trace! 😎

English

651

Ching-An Cheng retweetledi

Shao-Hua Sun@shaohua0116·19 May

Our ICML & RLC workshops welcome contributions using programmatic representations as policies, reward functions, skill libraries, task generators, environment models, etc., to improve interpretability, generalization, efficiency, & safety in agent learning & RL! Please retweet 🙏

English

21.9K

Ching-An Cheng@chinganc_rl·20 May

Organizers: Shao-Hua Sun @shaohua0116, Levi Lelis @levilelis, Xinyun Chen @xinyun_chen_, Shreyas Kapur @shreyaskapur, Jiayuan Mao @maojiayuan, Ching-An Cheng @chinganc_rl, Anqi Li @AnqiLi24, Kuang-Huei Lee @kuanghueilee, and Leslie Kaelbling

Indonesia

186

Ching-An Cheng@chinganc_rl·20 May

Workshop on Programmatic Reinforcement Learning (RLC 2025) - Web page: prl-workshop.github.io - Submission Deadline: May 30, 2025, AoE - Author Notification: June 15, 2025, AoE - Workshop Date: August 5, 2025 @ Edmonton, Canada

English

132

Ching-An Cheng@chinganc_rl·20 May

We're organizing workshops on Programmatic Representation for Agent Learning at the upcoming #ICML2025 and #RLC2025. We welcome contributions using programs as policies, reward functions, skill libraries, task generators, environment models, etc., and more! See you soon!😀

English

727

Ching-An Cheng@chinganc_rl·2 May

Started my new job at #Google Research recently. Super excited about what can be done here. 😎

English

290

26.2K

Ching-An Cheng retweetledi

RL_Conference@RL_Conference·14 Nis

The RLC accepted workshops list is out (link in next tweet)! Programmatic RL Causal RL RL and videogames Inductive biases and RL and returning from last year: RL beyond rewards, finding the frame, and RL in practice!

English

103

17.9K

Ching-An Cheng@chinganc_rl·21 Şub

@nanjiang_cs @guitarstring7 The grad variance can change a lot if we do the reward shifting though

English

189

Nan Jiang@nanjiang_cs·20 Şub

@guitarstring7 if by gradient you mean the stochastic gradient then yes it does change. and yes the dir of the expected gradient doesn’t

English

143

Nan Jiang@nanjiang_cs·19 Şub

Fun exercise: (1) SFT doesn’t use negative data. (2) PG’s direction doesn’t change with affine transformation of reward. (3) redefining reward as 2*reward-1 brings in negative data. What gives?

yobibyte@y0b1byte

RL/RLHF/LLM folks, is my reasoning correct? If we have two trajectories with sparse rewards (one traj with 0, one traj with 1), a single REINFORCE update step is equivalent to SFT with cross-entropy on the good trajectory with reward 1. Effectively, both of the methods want to go towards a policy that gives the probability of one to a good trajectory.

English

9.9K

Keşfet

@xuanalogue @wanqiao_xu @ruijie_zheng12 @adityamodi94 @adith387 @shaohua0116 @levilelis @xinyun_chen_