Rasool Fakoor (@rasoolfa) - Hồ sơ Twitter | Zamantika Mersobahis Locabet

Tweet ghim

Too many RL ideas die at the edge of the LLM/VLM/VLA training stack. Not anymore. With FeynRL, new algorithms ideas do not have to fight the whole stack 🚀. Focus on the alg while still training very large models. github.com/FeynRL-project… Try it, 🌟 it, send feedback.

English

1

3

113

Rasool Fakoor@rasoolfa·1d

Once you check it out, you’ll see the difference immediately. As #ICLR2026 wraps up, this might be a good starting point for your next idea, startup, project, or conference submission.

English

0

1

111

Rasool Fakoor@rasoolfa·1d

Too many RL ideas die at the edge of the LLM/VLM/VLA training stack. Not anymore. With FeynRL, new algorithms ideas do not have to fight the whole stack 🚀. Focus on the alg while still training very large models. github.com/FeynRL-project… Try it, 🌟 it, send feedback.

English

1

3

113

Rasool Fakoor@rasoolfa·3d

One thing I keep hearing is that RL for L(L)Ms is "mostly a systems problem now" and the RL part is basically good enough. I really don’t buy that. Current RL algs are still fragile as hell. Better systems help, but they don’t magically make the RL problem go away.

English

0

2

141

Rasool Fakoor đã retweet

Jonas Mueller@jomulr·13 Nis

Our organizing team is excited for a productive day of discussion with you on May 26: @natashajaques, @TheAndiPenguin, @rasoolfa, @anishathalye, @migballesteros, @aagohary, Aziza Mirsaidova, Priyaranjan Pattnayak, Ahmed Elgohary, Alina Gavrilov, Aparna Elangovan, Graham Horwood

English

0

1

2

187

Rasool Fakoor@rasoolfa·13 Nis

Are you working on RL, principled ways to build RL envs for agent training, or effective evaluation for agents? Want to showcase your NeurIPS submission? or just discuss about research more broadly? Then consider submitting and attending to our first ever workshop on Methods and RL Environments for Evaluating AI Agents. Deadline: May 11 rl-eval.github.io

Jonas Mueller@jomulr

📢 Call for papers: Workshop on Methods and Reinforcement Learning Environments for Evaluating AI Agents @ ACM CAIS 2026 (inaugural edition!) Topics include: - Design principles for effective RL Environments - Methods to evaluate Agents, esp. causal/interventional techniques

English

0

1

7

981

Rasool Fakoor@rasoolfa·3 Nis

@novasarc01 @oneill_c Well, we released one but we want to focus back on RL rather than on system. The goal is to provide a clean framework that people understand and build new RL alg without having to deal with a convoluted code. Take a look and you'll see the difference github.com/FeynRL-project…

English

0

33

λux@novasarc01·2 Nis

@oneill_c 🤣🤣🤣

QME

1

0

529

Charlie O'Neill@oneill_c·2 Nis

Just one more RL training library bro. I promise bro just one more library and we'll fix async and decoupled training/inference and off-policiness bro. Please bro just one more

English

10

15

291

18.2K

Rasool Fakoor@rasoolfa·3 Nis

@ClementDelangue @badlogicgames I'd suggest trying this to do post-training github.com/FeynRL-project… while things are built to be clear and modular, at the same time you can run large scale experiments. Take a look and you will see the difference!

English

0

1

96

clem 🤗@ClementDelangue·2 Nis

@badlogicgames github.com/huggingface/trl + huggingface.co/blog/async-rl-… huggingface.co/blog/unsloth-j… You'll have to create an account on HF to share your experiments btw!

English

2

3

58

3.8K

Mario Zechner@badlogicgames·2 Nis

really want to start some posttraining experiments. what's the best place to learn about the latest and greatest? my knowledge is ca. end of 2023. specifically interested in agentic capabilities.

English

14

7

151

15.5K

Rasool Fakoor@rasoolfa·12 Mar

Want to build your own voice agent? 🎙️ Want to learn, have fun, and win prizes? 🏆 Then this hackathon is built for you! Join now and start creating. 🚀

BosonAI@boson_ai

At the Boson AI Higgs Audio Hackathon in collaboration with @Eigen_AI_Labs , you’ll work with ultra-low latency inference, expressive prosody modelling, advanced voice cloning + audio understanding with our model served by Eigen AI. Apply today: luma.com/3vnw0e0q

English

0

1

296

Rasool Fakoor đã retweet

Tianwei Ni@twni2016·29 Eki

This work was recently accepted by TMLR! openreview.net/forum?id=RF6ra… Besides our main contributions in our previous post, below are our additional insights in this TMLR version when applying preference-based and unlearning-based methods to LLM math reasoning:

Tianwei Ni@twni2016

Can we make LLMs reason effectively without a huge inference time cost? We show a powerful approach through learning and forgetting! Our recipe: 1️⃣ Aggregate reasoning paths from diverse sources: Chain-of-Thought, inference-time search (Tree-of-Thought, Reasoning-via-Planning), classic algorithms (BFS, DFS) 2️⃣ Learn successful reasoning paths ✅ while forgetting failed reasoning paths ❌ at the same time, which we call Unlikelihood Fine-Tuning (UFT) 3️⃣ Small learning rate is crucial to preserve inference-time search capabilities Results on challenging math games, Countdown & Game-of-24: ⚡180× faster inference than search-based baseline 📈Beats CoT and inference-time search (ToT, RAP) 📄 Paper: arxiv.org/abs/2504.11364 💻 Code & data: github.com/twni2016/llm-r… Work completed during my internship at @AmazonScience. Thank you to my co-authors @allen_a_nie @Sapana_007 @yaoliucs Huzefa Rangwala @rasoolfa!

English

1

3

5

757

Rasool Fakoor@rasoolfa·7 Eki

@agarwl_ Tinker has built a somewhat unbiased estimator, where the root cause is mainly off-policyness. Based on my experience, as you address factors like nondeterminism, even when their effects seem negligible, the behavior becomes more predictable especially in RL

English

0

2

584

Rishabh Agarwal@agarwl_·7 Eki

I started playing a bit around with Tinker for RL runs on Qwen3 models and one thing I'm impressed by is the small KL discrepancy between the generator and trainer across dense and MoE models. This is 10x smaller than what I typically observe for Qwen dense models if I were to naively combine off the shelf inference engine, specifically @vllm_project, with popular training frameworks (torchtitan or Megatron). I guess there might be a couple of reasons for this: - LoRA instead of full weights? - Dealing with Non-determinism (Thinky post by @cHHillee) - Anything else? Maybe higher precision at certain layers (final matmul ala Minimax, routing layers in MoEs) It'd be great for RL stability if off-the shelf training frameworks vllm + training frameworks would also lead to similar discrepancy.

English

13

21

417

42.9K

Rasool Fakoor@rasoolfa·9 Ağu

The application closes on Tuesday (8/12). If you are interested, please apply and don't wait until the last minute.

Rasool Fakoor@rasoolfa

Our team is *hiring* interns & researchers! We’re a small team of hardcore researchers & engineers working on foundation models, agentic methods, and embodiment. If you have strong publications and related experience, plz fill out application form. forms.gle/4bUeFfksUhCLap…

English

0

333

Rasool Fakoor@rasoolfa·6 Ağu

Our team is *hiring* interns & researchers! We’re a small team of hardcore researchers & engineers working on foundation models, agentic methods, and embodiment. If you have strong publications and related experience, plz fill out application form. forms.gle/4bUeFfksUhCLap…

English

1

3

14

2.1K

Rasool Fakoor@rasoolfa·29 May

@roydanroy it should really be called "surprising results of using RL with Qwen-models" rather than "LLM and RL", because the conclusions (with spurious Rewards rewards, etc.) drawn so far mainly apply to the Qwen model family, not to other models.

English

0

3

129

Dan Roy@roydanroy·29 May

Not an ideal situation!

Shashwat Goel @ ICLR'26@ShashwatGoel7

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

English

4

0

10

3.3K

Rasool Fakoor đã retweet

Tianwei Ni@twni2016·24 Nis

Can we make LLMs reason effectively without a huge inference time cost? We show a powerful approach through learning and forgetting! Our recipe: 1️⃣ Aggregate reasoning paths from diverse sources: Chain-of-Thought, inference-time search (Tree-of-Thought, Reasoning-via-Planning), classic algorithms (BFS, DFS) 2️⃣ Learn successful reasoning paths ✅ while forgetting failed reasoning paths ❌ at the same time, which we call Unlikelihood Fine-Tuning (UFT) 3️⃣ Small learning rate is crucial to preserve inference-time search capabilities Results on challenging math games, Countdown & Game-of-24: ⚡180× faster inference than search-based baseline 📈Beats CoT and inference-time search (ToT, RAP) 📄 Paper: arxiv.org/abs/2504.11364 💻 Code & data: github.com/twni2016/llm-r… Work completed during my internship at @AmazonScience. Thank you to my co-authors @allen_a_nie @Sapana_007 @yaoliucs Huzefa Rangwala @rasoolfa!

English

0

5

24

2.8K

Rasool Fakoor đã retweet

Ke Yang@EmpathYang·23 Oca

Excited to announce that our web agent paper, AgentOccam, has been accepted to ICLR 2025! 🏂🏂🏂 Huge thanks to all collaborators! 😊 Special thanks to my brilliant and considerate mentor, Yao @yaoliucs, for your constant guidance and encouragement! Sapana @Sapana_007 and Rasool @rasoolfa, your insightful support has been invaluable. Huzefa, your unwavering support as our manager has been instrumental in our success. Pratik and George, your invaluable suggestions have greatly enriched our work. 📸: a recent photo capturing some senses of the Chinese phrase "大隐隐于市". #ICLR2025 #webagent

Ke Yang@EmpathYang

👾 Introducing AgentOccam: Automating Web Tasks with LLMs! 🌐 AgentOccam showcases the impressive power of Large Language Models (LLMs) on web tasks, without any in-context examples, new agent roles, online feedback, or search strategies. 🏄🏄🏄 🧙 Link: arxiv.org/abs/2410.13825 🧐 By refining the observation and action spaces, AgentOccam achieves a groundbreaking zero-shot performance, outperforming previous methods on the WebArena benchmark. This simple yet effective approach underlines the importance of aligning these spaces closely with LLM capabilities for enhanced efficiency. 📈 ✨ Highlights: - AgentOccam leads with a 29.4% improvement over state-of-the-art methods SteP, and a 161% boost in success rate compared to the vanilla agent. 🤖 - Achievements made possible without complicating the process with additional examples or strategies. 🚫 - All our replication work, prompts, and evaluator error rectifications are transparently shared in the appendix. 📚 🌟 Special thanks to my super brilliant and considerate mentor Yao and Rasool, our supportive manager Huzefa, and the invaluable suggestions and contributions from Sapana, Pratik, and George. Your guidance and support have been pivotal in this journey! #AgentOccam #LLM #WebAutomation #AI

English

0

6

16

1.3K

Rasool Fakoor@rasoolfa·26 Ara

@_arohan_ well, many problems in LM can be simply attributed to the gap between training-inference. This has been largely overlooked due to the assumption that the large size of the models renders this gap insignificant. This assumption, however, is incorrect. arxiv.org/abs/2410.14655

English

0

1

5

268

rohan anil@_arohan_·26 Ara

arxiv.org/pdf/2404.19737

ZXX

5

14

252

72.3K

Rasool Fakoor@rasoolfa·3 Ara

@anyaasims @cong_ml @j_foerst @yeewhye interesting paper indeed @anyaasims. We found "similar" results in model-free batch RL, where we attribute the sources of all problems to what we term 'extra-overestimation'. Also, compare fig 3 in your paper with our figure 1, practically the same. arxiv.org/abs/2102.09225

English

1

0

1

116

Anya Sims@anyaasims·3 Ara

📍 Come chat with us! Wed 11 Dec, 4:30 PM 📍 East Exhibit Hall A-C #4603 📄 Paper: buff.ly/3CVpI4W 💻Code: buff.ly/3CVpHOq Huge thanks to my amazing co-authors @cong_ml @j_foerst @yeewhye! 🥰🥰 See you at NeurIPS! [N/N]

English

1

7

575

Anya Sims@anyaasims·3 Ara

🎉 Excited to share our paper "The Edge-of-Reach Problem in Offline MBRL" has been accepted to #NeurIPS! 🌟 Looking forward to Vancouver! We reveal why offline MBRL methods work (or fail) and introduce a robust solution: RAVL 🚀 🧵 Let's dive in! [1/N]

English

4

12

35

9K

Rasool Fakoor đã retweet

Ke Yang@EmpathYang·18 Eki

👾 Introducing AgentOccam: Automating Web Tasks with LLMs! 🌐 AgentOccam showcases the impressive power of Large Language Models (LLMs) on web tasks, without any in-context examples, new agent roles, online feedback, or search strategies. 🏄🏄🏄 🧙 Link: arxiv.org/abs/2410.13825 🧐 By refining the observation and action spaces, AgentOccam achieves a groundbreaking zero-shot performance, outperforming previous methods on the WebArena benchmark. This simple yet effective approach underlines the importance of aligning these spaces closely with LLM capabilities for enhanced efficiency. 📈 ✨ Highlights: - AgentOccam leads with a 29.4% improvement over state-of-the-art methods SteP, and a 161% boost in success rate compared to the vanilla agent. 🤖 - Achievements made possible without complicating the process with additional examples or strategies. 🚫 - All our replication work, prompts, and evaluator error rectifications are transparently shared in the appendix. 📚 🌟 Special thanks to my super brilliant and considerate mentor Yao and Rasool, our supportive manager Huzefa, and the invaluable suggestions and contributions from Sapana, Pratik, and George. Your guidance and support have been pivotal in this journey! #AgentOccam #LLM #WebAutomation #AI