Rasool Fakoor

608 posts

Rasool Fakoor banner
Rasool Fakoor

Rasool Fakoor

@rasoolfa

Building agents that reason, adapt, and act with RL and friends!

Joined Aralık 2012
1.4K Following443 Followers
Pinned Tweet
Rasool Fakoor
Rasool Fakoor@rasoolfa·
Too many RL ideas die at the edge of the LLM/VLM/VLA training stack. Not anymore. With FeynRL, new algorithms ideas do not have to fight the whole stack 🚀. Focus on the alg while still training very large models. github.com/FeynRL-project… Try it, 🌟 it, send feedback.
English
1
1
3
113
Rasool Fakoor
Rasool Fakoor@rasoolfa·
Once you check it out, you’ll see the difference immediately. As #ICLR2026 wraps up, this might be a good starting point for your next idea, startup, project, or conference submission.
English
0
0
1
111
Rasool Fakoor
Rasool Fakoor@rasoolfa·
Too many RL ideas die at the edge of the LLM/VLM/VLA training stack. Not anymore. With FeynRL, new algorithms ideas do not have to fight the whole stack 🚀. Focus on the alg while still training very large models. github.com/FeynRL-project… Try it, 🌟 it, send feedback.
English
1
1
3
113
Rasool Fakoor
Rasool Fakoor@rasoolfa·
One thing I keep hearing is that RL for L(L)Ms is "mostly a systems problem now" and the RL part is basically good enough. I really don’t buy that. Current RL algs are still fragile as hell. Better systems help, but they don’t magically make the RL problem go away.
English
0
0
2
141
Rasool Fakoor
Rasool Fakoor@rasoolfa·
Are you working on RL, principled ways to build RL envs for agent training, or effective evaluation for agents? Want to showcase your NeurIPS submission? or just discuss about research more broadly? Then consider submitting and attending to our first ever workshop on Methods and RL Environments for Evaluating AI Agents. Deadline: May 11 rl-eval.github.io
Jonas Mueller@jomulr

📢 Call for papers: Workshop on Methods and Reinforcement Learning Environments for Evaluating AI Agents @ ACM CAIS 2026 (inaugural edition!) Topics include: - Design principles for effective RL Environments - Methods to evaluate Agents, esp. causal/interventional techniques

English
0
1
7
981
Rasool Fakoor
Rasool Fakoor@rasoolfa·
@novasarc01 @oneill_c Well, we released one but we want to focus back on RL rather than on system. The goal is to provide a clean framework that people understand and build new RL alg without having to deal with a convoluted code. Take a look and you'll see the difference github.com/FeynRL-project…
English
0
0
0
33
Charlie O'Neill
Charlie O'Neill@oneill_c·
Just one more RL training library bro. I promise bro just one more library and we'll fix async and decoupled training/inference and off-policiness bro. Please bro just one more
English
10
15
291
18.2K
Mario Zechner
Mario Zechner@badlogicgames·
really want to start some posttraining experiments. what's the best place to learn about the latest and greatest? my knowledge is ca. end of 2023. specifically interested in agentic capabilities.
English
14
7
151
15.5K
Rasool Fakoor retweeted
Rasool Fakoor
Rasool Fakoor@rasoolfa·
@agarwl_ Tinker has built a somewhat unbiased estimator, where the root cause is mainly off-policyness. Based on my experience, as you address factors like nondeterminism, even when their effects seem negligible, the behavior becomes more predictable especially in RL
English
0
0
2
584
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
I started playing a bit around with Tinker for RL runs on Qwen3 models and one thing I'm impressed by is the small KL discrepancy between the generator and trainer across dense and MoE models. This is 10x smaller than what I typically observe for Qwen dense models if I were to naively combine off the shelf inference engine, specifically @vllm_project, with popular training frameworks (torchtitan or Megatron). I guess there might be a couple of reasons for this: - LoRA instead of full weights? - Dealing with Non-determinism (Thinky post by @cHHillee) - Anything else? Maybe higher precision at certain layers (final matmul ala Minimax, routing layers in MoEs) It'd be great for RL stability if off-the shelf training frameworks vllm + training frameworks would also lead to similar discrepancy.
Rishabh Agarwal tweet media
English
13
21
417
42.9K
Rasool Fakoor
Rasool Fakoor@rasoolfa·
Our team is *hiring* interns & researchers! We’re a small team of hardcore researchers & engineers working on foundation models, agentic methods, and embodiment. If you have strong publications and related experience, plz fill out application form. forms.gle/4bUeFfksUhCLap…
English
1
3
14
2.1K
Rasool Fakoor
Rasool Fakoor@rasoolfa·
@roydanroy it should really be called "surprising results of using RL with Qwen-models" rather than "LLM and RL", because the conclusions (with spurious Rewards rewards, etc.) drawn so far mainly apply to the Qwen model family, not to other models.
English
0
0
3
129
Rasool Fakoor retweeted
Tianwei Ni
Tianwei Ni@twni2016·
Can we make LLMs reason effectively without a huge inference time cost? We show a powerful approach through learning and forgetting! Our recipe: 1️⃣ Aggregate reasoning paths from diverse sources: Chain-of-Thought, inference-time search (Tree-of-Thought, Reasoning-via-Planning), classic algorithms (BFS, DFS) 2️⃣ Learn successful reasoning paths ✅ while forgetting failed reasoning paths ❌ at the same time, which we call Unlikelihood Fine-Tuning (UFT) 3️⃣ Small learning rate is crucial to preserve inference-time search capabilities Results on challenging math games, Countdown & Game-of-24: ⚡180× faster inference than search-based baseline 📈Beats CoT and inference-time search (ToT, RAP) 📄 Paper: arxiv.org/abs/2504.11364 💻 Code & data: github.com/twni2016/llm-r… Work completed during my internship at @AmazonScience. Thank you to my co-authors @allen_a_nie @Sapana_007 @yaoliucs Huzefa Rangwala @rasoolfa!
Tianwei Ni tweet mediaTianwei Ni tweet media
English
0
5
24
2.8K
Rasool Fakoor retweeted
Ke Yang
Ke Yang@EmpathYang·
Excited to announce that our web agent paper, AgentOccam, has been accepted to ICLR 2025! 🏂🏂🏂 Huge thanks to all collaborators! 😊 Special thanks to my brilliant and considerate mentor, Yao @yaoliucs, for your constant guidance and encouragement! Sapana @Sapana_007 and Rasool @rasoolfa, your insightful support has been invaluable. Huzefa, your unwavering support as our manager has been instrumental in our success. Pratik and George, your invaluable suggestions have greatly enriched our work. 📸: a recent photo capturing some senses of the Chinese phrase "大隐隐于市". #ICLR2025 #webagent
Ke Yang tweet media
Ke Yang@EmpathYang

👾 Introducing AgentOccam: Automating Web Tasks with LLMs! 🌐 AgentOccam showcases the impressive power of Large Language Models (LLMs) on web tasks, without any in-context examples, new agent roles, online feedback, or search strategies. 🏄🏄🏄 🧙 Link: arxiv.org/abs/2410.13825 🧐 By refining the observation and action spaces, AgentOccam achieves a groundbreaking zero-shot performance, outperforming previous methods on the WebArena benchmark. This simple yet effective approach underlines the importance of aligning these spaces closely with LLM capabilities for enhanced efficiency. 📈 ✨ Highlights: - AgentOccam leads with a 29.4% improvement over state-of-the-art methods SteP, and a 161% boost in success rate compared to the vanilla agent. 🤖 - Achievements made possible without complicating the process with additional examples or strategies. 🚫 - All our replication work, prompts, and evaluator error rectifications are transparently shared in the appendix. 📚 🌟 Special thanks to my super brilliant and considerate mentor Yao and Rasool, our supportive manager Huzefa, and the invaluable suggestions and contributions from Sapana, Pratik, and George. Your guidance and support have been pivotal in this journey! #AgentOccam #LLM #WebAutomation #AI

English
0
6
16
1.3K
Rasool Fakoor
Rasool Fakoor@rasoolfa·
@_arohan_ well, many problems in LM can be simply attributed to the gap between training-inference. This has been largely overlooked due to the assumption that the large size of the models renders this gap insignificant. This assumption, however, is incorrect. arxiv.org/abs/2410.14655
English
0
1
5
268
Anya Sims
Anya Sims@anyaasims·
🎉 Excited to share our paper "The Edge-of-Reach Problem in Offline MBRL" has been accepted to #NeurIPS! 🌟 Looking forward to Vancouver! We reveal why offline MBRL methods work (or fail) and introduce a robust solution: RAVL 🚀 🧵 Let's dive in! [1/N]
English
4
12
35
9K
Rasool Fakoor retweeted
Ke Yang
Ke Yang@EmpathYang·
👾 Introducing AgentOccam: Automating Web Tasks with LLMs! 🌐 AgentOccam showcases the impressive power of Large Language Models (LLMs) on web tasks, without any in-context examples, new agent roles, online feedback, or search strategies. 🏄🏄🏄 🧙 Link: arxiv.org/abs/2410.13825 🧐 By refining the observation and action spaces, AgentOccam achieves a groundbreaking zero-shot performance, outperforming previous methods on the WebArena benchmark. This simple yet effective approach underlines the importance of aligning these spaces closely with LLM capabilities for enhanced efficiency. 📈 ✨ Highlights: - AgentOccam leads with a 29.4% improvement over state-of-the-art methods SteP, and a 161% boost in success rate compared to the vanilla agent. 🤖 - Achievements made possible without complicating the process with additional examples or strategies. 🚫 - All our replication work, prompts, and evaluator error rectifications are transparently shared in the appendix. 📚 🌟 Special thanks to my super brilliant and considerate mentor Yao and Rasool, our supportive manager Huzefa, and the invaluable suggestions and contributions from Sapana, Pratik, and George. Your guidance and support have been pivotal in this journey! #AgentOccam #LLM #WebAutomation #AI
Ke Yang tweet mediaKe Yang tweet mediaKe Yang tweet mediaKe Yang tweet media
English
3
27
60
11.1K