Scott

6 posts

Scott

Scott

@Scott3131493885

Katılım Mart 2026
8 Takip Edilen2 Takipçiler
Scott
Scott@Scott3131493885·
The experimental results show that Rollback strategy enables RL training on extremely hard agentic tasks where the agent initially never completes the task end to end.
Scott tweet media
English
1
0
1
14
Scott
Scott@Scott3131493885·
Are you also struggling with RL on long-horizon, high-difficulty agentic tasks, especially when positive rewards are sparse? Check out the latest blog from the ROLL team: warm-pajama-44a.notion.site/Save-Load-and-…
English
2
1
1
362