

Labib Tazwar Rahman
60 posts

@LabibIsHere
life shouldn’t be too simple. you need to be friction maxxing. adjunct faculty @stanford











Big news! @Stanford is merging @StanfordHAI & Stanford Data Science into a single institute, led by @landay. Continuing under the HAI name, the institute seeks to advance AI & data science for discovery, transform education, and shape AI’s societal impact: news.stanford.edu/stories/2026/0…

Exploration is the lifeblood of learning from experience. An agent must search broadly to uncover successful behaviors. It should continue exploring to expand its capabilities by learning distinct strategies to complex problems. Threading this needle between exploration and exploitation is critical for solving unsolved problems at test-time. An algorithm should encourage (1) optimistically exploring reasoning strategies, and (2) achieving a synergy between exploration and exploitation. Towards that end, we develop Poly-EPO: a method for training LMs to explore and reason. Work with @ifdita_hasan (co-lead), Shreya, @ShirleyYXWu, @HengyuanH, @noahdgoodman, @DorsaSadigh, and @chelseabfinn. 🧵

how is this a class? absolutely insane line-up

Are we done with new RL algorithms? Turns out we might have been optimizing the wrong objective. Introducing MaxRL, a framework to bring maximum likelihood optimization to RL settings. Paper + code + project website: zanette-labs.github.io/MaxRL/ 🧵 1/n



