Binfeng Xu

641 posts

Binfeng Xu

@billxbf

AI @nvidia | agent RL for computer-use. Retiring myself with continual learning. Opinions are mine.

NY انضم Mayıs 2022

221 يتبع969 المتابعون

تغريدة مثبتة

Binfeng Xu@billxbf·26 May

Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change. Find a problem, design the harness, and train your own agents! 🧵

English

144

903

131K

Binfeng Xu@billxbf·2d

@JimSZ7 yes you can always create noisy environment and tasks during training to upsample rollouts under failure modes.

English

Jim_SZ🇭🇰@JimSZ7·2d

@billxbf PRM fixes credit assignment given the trace. The harder gap is coverage. Crash recovery and state repair are off distribution from clean rollouts, so they rarely get sampled and the PRM never scores them. You almost have to inject faults to get the traces worth crediting.

English

Binfeng Xu@billxbf·26 May

English

144

903

131K

Binfeng Xu@billxbf·2d

@JimSZ7 that’s why you need PRM for credit assignment

English

Jim_SZ🇭🇰@JimSZ7·2d

@billxbf Treating the harness as a black box is right for rollouts. The catch is what the reward sees. A clean trace shows whether one run succeeded, not whether it recovers after a crash mid step or keeps state coherent across hours. Those modes never show in a clean rollout.

English

Binfeng Xu أُعيد تغريده

Kimi.ai@Kimi_Moonshot·4d

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai

English

623

1.7K

13.8K

2.4M

Binfeng Xu@billxbf·3d

@zhuzilinallen rush b

Indonesia

175

zhuzilin@zhuzilinallen·3d

ZXX

338

Binfeng Xu@billxbf·3d

@rdesh26 @willccbb guarantee that’s true

English

Desh Raj@rdesh26·3d

@willccbb Unfortunately, no job title containing "engineer" or "scientist" has ever been considered hot in NYC :)

English

648

will brown@willccbb·4d

NYC’s hottest new job title is Principal Agent Engineer

English

103

9.9K

Binfeng Xu@billxbf·4d

@Teknium @ryanvogel need /evolve than /learn for such

English

268

Teknium 🪽@Teknium·4d

@ryanvogel Hermes

English

3.3K

vogel@ryanvogel·4d

I still think we need some /learn command I just burned 1.3M fable tokens on some weird expo bug, then fixed it, but then spawned a new session and fable made the same mistake AGAIN. There needs to be like some internal repo stackoverflow reference guide

English

230

75.9K

Binfeng Xu@billxbf·5d

@natolambert Congrats!

English

157

Nathan Lambert@natolambert·5d

I quickly became friends with Arcee's leadership and can't help but root for their humble approach to building the open ecosystem. No nonsense licenses, no projecting, just enabling broad access to efficient intelligence. I'm happily supporting their research as an advisor.

Arcee.ai@arcee_ai

We are thrilled to announce that @natolambert is joining Arcee as a Research Advisor. Nathan’s work and thought leadership have been instrumental to the open model ecosystem, and his guidance comes at a critical time as open builders face growing pressure. This is a major addition for Arcee and the American OS movement. Nathan brings the conviction, taste, and technical depth this moment calls for.

English

831

60.8K

Binfeng Xu@billxbf·5d

@agarwl_ thanks for the todo list!

English

857

Rishabh Agarwal@agarwl_·5d

Anthropic: What are you going to do about it? - Complain on Twitter? - Switch to Codex? - Train your own frontier model !?

SemiAnalysis@SemiAnalysis_

BREAKING NEWS: Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice. We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming 😭

English

263

44K

Binfeng Xu@billxbf·6d

@suchenzang Agree with everything here. But what bothers many people isn’t that it’s a business, but the use of obviously dishonest narratives like safety as cover. The issue is less the decision itself and more the framing.

English

424

Susan Zhang@suchenzang·6d

anthropic doesn't owe anyone "frontier capabilities". none of the labs do. they are all simply selling a product, or a story, that people pay for. that aside, the more telling bit is how far anthropic is willing to go to secure a narrative around "capability slowdown", post a massive raise, before an ipo, and with enterprise contracts rising for those rich enough to pay to similarly keep up the image of "powered-by/secured-by agentic AI". with the amount of capex spent so far, this was never meant to be some democratizing technology "for the people". this is all simply just business.

English

1.3K

145.1K

Binfeng Xu أُعيد تغريده

Nathan Lambert@natolambert·6d

Why I think Anthropic's uneven safety policies with the release of Claude Fable 5 undermine the broader AI community's cohesion and accelerate us to more uncertainty and risk in AI's near-term evolution. interconnects.ai/p/claude-fable…

English

407

36.3K

Binfeng Xu@billxbf·6d

@gneubig @xwang_lk There’s gonna be the oss coalition 🙂

English

1.4K

Graham Neubig@gneubig·6d

First they came for the model builders... I feel we're getting a glimpse of a future where AI is only provided to a privileged few, and that's not a future I want to live in.

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

English

104

846

69.7K

Binfeng Xu أُعيد تغريده

elie@eliebakouch·6d

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

Claude@claudeai

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

English

359

646

5.6K

3.9M

Binfeng Xu@billxbf·7 Haz

@giffmana hard to sustain open research without a business

English

292

Lucas Beyer (bl16)@giffmana·7 Haz

Do I understand it correctly that the OLMo from-scratch series is coming to an end? If so, looks like NVIDIA stepped up just in time with Nemotron models as the only remaining fully-open (ie not just weight drop) from-scratch LLM team.

English

470

81.5K

Binfeng Xu@billxbf·5 Haz

@slime_framework should advantage estimation live in rollout or training part? Now I increasingly feel that algorithm level (how you handle multi-trace, assign reward and estimate advantage) are cleanest if bundled within rollout, while trainer is simplified as a backprob machine.

English

468

slime@slime_framework·5 Haz

Most RL frameworks are moving from “engine mode” to “server mode”. slime goes one step further: the RL job does not need to own the rollout servers at all. Bring your own SGLang fleet, already deployed and managed by your serving system. slime connects to it, registers it with the router, generates rollouts, and syncs updated actor weights via NCCL or disk-based full/delta transport. This is the deployment shape we believe large-scale agentic RL is moving toward: training and inference as independently managed systems, connected by a clean rollout + weight-sync contract.

English

143

25.1K

Binfeng Xu@billxbf·4 Haz

check out our new open-source ultra3 …and data, env, reward model, infra, recipe, and tech report!

Oleksii Kuchaiev@kuchaev

Our post-training pipeline is a substantial redesign from Super. The core idea: don't rely on stacked RL stages alone. We do SFT, multi-environment RLVR across a huge mix of agentic/reasoning/code/safety environments, then Multi-teacher On-Policy Distillation (MOPD). 10+ domain-specialized teachers, merged into the student via dense token-level guidance on its own rollouts. See Figures below for overview and tech report for all the details. 2/4

English

1.2K

Binfeng Xu@billxbf·3 Haz

@YichuanM Good training and rollout infra are just half of the story. Task & env generation are the expensive part most labs won’t share. Single task+docker can cost you $1000+ I’d recommend CUA Gym from @BowenWangNLP to see some synthetic scaling approaches

English

857

Yichuan Wang@YichuanM·3 Haz

seriously asking: agentic RL is probably one of the most hyped topics in AI research right now. yet when i look for open-source repos with both a real data recipe and production-quality infra, i can barely find any. the only three i'd confidently recommend today are: • SkyRL-Agent for SWE(@shiyi_c98) • Endless Terminals for Terminal Bench (@DimitrisPapail) • Polar Agent for SWE (@NVIDIAAI) maybe also some search agent?? (what shoud the best one be?) am i just bad at searching, or are 95% of agentic RL papers still not releasing a usable stack? (let alone OPD stuff...) would love recommendations! appreciate any pointers, especially for the most exciting recent applications.

English

434

33.5K

Binfeng Xu@billxbf·3 Haz

how about /sleep-and-learn for internalizing and updating the weights?

Thariq@trq212

x.com/i/article/2061…

English

1.6K

Binfeng Xu@billxbf·2 Haz

@natolambert @allen_ai looking forward to see what’s next! 🐐🐐

English

227

Nathan Lambert@natolambert·2 Haz

My time at Ai2 / @allen_ai has come to an end. Ai2 is a wonderful place. The last 2.5+ years building Olmo, Tulu, and other projects will be one of the peaks of my entire career. I'm extremely thankful for my teammates and the open community who made this work possible. For me, it's time to try something different. I will still be working in the open model & open science spaces (more news on that soon). In the meantime I'll be spending a few months learning, chatting with a broader network, getting married (!!) and most importantly recharging from pouring my soul into this place. I've attached the note I shared with the team and some fun photos from our time together. I'll keep cheering for Ai2 and am excited to see what you build next.