Vaskar Nath
18 posts

Vaskar Nath
@vaskar_n
Researcher @ Scale AI







For online RL, we introduce Guide, a class of algorithms which incorporate guidance into the model’s context when all rollouts fail and adjusts the importance sampling ratio in order to optimize the policy for contexts in which guidance is no longer present.













Our researchers at Scale have developed a novel method to evaluate LLM output during generation instead of waiting until it’s complete — like a GPS recalculating when you go off route, before you’re at the wrong place. Learn more on the Scale blog: bit.ly/aligning-chatb…



We’re releasing the results on ToolComp today, a Scale AI SEAL leaderboard that tests the ability of agents to plan, reason, and compose multiple, dependent tool calls together. OpenAI models lead with Claude showing strong performance in the Chat setting. 1/🛠️🤖



