Tim Vieira
2.5K posts

Tim Vieira
@xtimv
machine learning, reinforcement learning, programming languages, handstands (he/him)













Many LM applications may be formulated as targeting some (Boolean) constraint. Generate a… - Python program that passes a test suite - PDDL plan that satisfies a goal - CoT trajectory that yields a positive reward The list goes on… How can we efficiently satisfy these? 🧵👇



Outstanding paper 🏆 1: Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling openreview.net/forum?id=3BmPS…

Current KL estimation practices in RLHF can generate high variance and even negative values! We propose a provably better estimator that only takes a few lines of code to implement.🧵👇 w/ @xtimv and Ryan Cotterell code: arxiv.org/pdf/2504.10637 paper: github.com/rycolab/kl-rb









