Binfeng Xu

641 posts

Binfeng Xu banner
Binfeng Xu

Binfeng Xu

@billxbf

AI @nvidia | agent RL for computer-use. Retiring myself with continual learning. Opinions are mine.

NY 가입일 Mayıs 2022
221 팔로잉969 팔로워
고정된 트윗
Binfeng Xu
Binfeng Xu@billxbf·
Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change. Find a problem, design the harness, and train your own agents! 🧵
Binfeng Xu tweet media
English
26
144
903
131K
Binfeng Xu
Binfeng Xu@billxbf·
@JimSZ7 yes you can always create noisy environment and tasks during training to upsample rollouts under failure modes.
English
1
0
0
22
Jim_SZ🇭🇰
Jim_SZ🇭🇰@JimSZ7·
@billxbf PRM fixes credit assignment given the trace. The harder gap is coverage. Crash recovery and state repair are off distribution from clean rollouts, so they rarely get sampled and the PRM never scores them. You almost have to inject faults to get the traces worth crediting.
English
1
0
0
17
Binfeng Xu
Binfeng Xu@billxbf·
Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change. Find a problem, design the harness, and train your own agents! 🧵
Binfeng Xu tweet media
English
26
144
903
131K
Binfeng Xu
Binfeng Xu@billxbf·
@JimSZ7 that’s why you need PRM for credit assignment
English
1
0
0
21
Jim_SZ🇭🇰
Jim_SZ🇭🇰@JimSZ7·
@billxbf Treating the harness as a black box is right for rollouts. The catch is what the reward sees. A clean trace shows whether one run succeeded, not whether it recovers after a crash mid step or keeps state coherent across hours. Those modes never show in a clean rollout.
English
1
0
1
19
Binfeng Xu 리트윗함
Kimi.ai
Kimi.ai@Kimi_Moonshot·
🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai
Kimi.ai tweet mediaKimi.ai tweet media
English
621
1.7K
13.8K
2.3M
Desh Raj
Desh Raj@rdesh26·
@willccbb Unfortunately, no job title containing "engineer" or "scientist" has ever been considered hot in NYC :)
English
1
0
6
648
will brown
will brown@willccbb·
NYC’s hottest new job title is Principal Agent Engineer
English
13
0
103
9.9K
vogel
vogel@ryanvogel·
I still think we need some /learn command I just burned 1.3M fable tokens on some weird expo bug, then fixed it, but then spawned a new session and fable made the same mistake AGAIN. There needs to be like some internal repo stackoverflow reference guide
English
44
2
230
75.9K
Nathan Lambert
Nathan Lambert@natolambert·
I quickly became friends with Arcee's leadership and can't help but root for their humble approach to building the open ecosystem. No nonsense licenses, no projecting, just enabling broad access to efficient intelligence. I'm happily supporting their research as an advisor.
Arcee.ai@arcee_ai

We are thrilled to announce that @natolambert is joining Arcee as a Research Advisor. Nathan’s work and thought leadership have been instrumental to the open model ecosystem, and his guidance comes at a critical time as open builders face growing pressure. This is a major addition for Arcee and the American OS movement. Nathan brings the conviction, taste, and technical depth this moment calls for.

English
49
26
831
60.8K
Binfeng Xu
Binfeng Xu@billxbf·
@suchenzang Agree with everything here. But what bothers many people isn’t that it’s a business, but the use of obviously dishonest narratives like safety as cover. The issue is less the decision itself and more the framing.
English
0
0
7
424
Susan Zhang
Susan Zhang@suchenzang·
anthropic doesn't owe anyone "frontier capabilities". none of the labs do. they are all simply selling a product, or a story, that people pay for. that aside, the more telling bit is how far anthropic is willing to go to secure a narrative around "capability slowdown", post a massive raise, before an ipo, and with enterprise contracts rising for those rich enough to pay to similarly keep up the image of "powered-by/secured-by agentic AI". with the amount of capex spent so far, this was never meant to be some democratizing technology "for the people". this is all simply just business.
English
87
59
1.3K
145.1K
Binfeng Xu 리트윗함
Nathan Lambert
Nathan Lambert@natolambert·
Why I think Anthropic's uneven safety policies with the release of Claude Fable 5 undermine the broader AI community's cohesion and accelerate us to more uncertainty and risk in AI's near-term evolution. interconnects.ai/p/claude-fable…
English
13
49
407
36.3K
Binfeng Xu
Binfeng Xu@billxbf·
@giffmana hard to sustain open research without a business
English
0
0
3
292
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Do I understand it correctly that the OLMo from-scratch series is coming to an end? If so, looks like NVIDIA stepped up just in time with Nemotron models as the only remaining fully-open (ie not just weight drop) from-scratch LLM team.
English
40
15
470
81.5K
Binfeng Xu
Binfeng Xu@billxbf·
@slime_framework should advantage estimation live in rollout or training part? Now I increasingly feel that algorithm level (how you handle multi-trace, assign reward and estimate advantage) are cleanest if bundled within rollout, while trainer is simplified as a backprob machine.
English
0
1
2
468
slime
slime@slime_framework·
Most RL frameworks are moving from “engine mode” to “server mode”. slime goes one step further: the RL job does not need to own the rollout servers at all. Bring your own SGLang fleet, already deployed and managed by your serving system. slime connects to it, registers it with the router, generates rollouts, and syncs updated actor weights via NCCL or disk-based full/delta transport. This is the deployment shape we believe large-scale agentic RL is moving toward: training and inference as independently managed systems, connected by a clean rollout + weight-sync contract.
English
5
13
143
25.1K
Binfeng Xu
Binfeng Xu@billxbf·
@YichuanM Good training and rollout infra are just half of the story. Task & env generation are the expensive part most labs won’t share. Single task+docker can cost you $1000+ I’d recommend CUA Gym from @BowenWangNLP to see some synthetic scaling approaches
English
2
0
7
857
Yichuan Wang
Yichuan Wang@YichuanM·
seriously asking: agentic RL is probably one of the most hyped topics in AI research right now. yet when i look for open-source repos with both a real data recipe and production-quality infra, i can barely find any. the only three i'd confidently recommend today are: • SkyRL-Agent for SWE(@shiyi_c98) • Endless Terminals for Terminal Bench (@DimitrisPapail) • Polar Agent for SWE (@NVIDIAAI) maybe also some search agent?? (what shoud the best one be?) am i just bad at searching, or are 95% of agentic RL papers still not releasing a usable stack? (let alone OPD stuff...) would love recommendations! appreciate any pointers, especially for the most exciting recent applications.
English
24
26
434
33.5K
Nathan Lambert
Nathan Lambert@natolambert·
My time at Ai2 / @allen_ai has come to an end. Ai2 is a wonderful place. The last 2.5+ years building Olmo, Tulu, and other projects will be one of the peaks of my entire career. I'm extremely thankful for my teammates and the open community who made this work possible. For me, it's time to try something different. I will still be working in the open model & open science spaces (more news on that soon). In the meantime I'll be spending a few months learning, chatting with a broader network, getting married (!!) and most importantly recharging from pouring my soul into this place. I've attached the note I shared with the team and some fun photos from our time together. I'll keep cheering for Ai2 and am excited to see what you build next.
Nathan Lambert tweet mediaNathan Lambert tweet mediaNathan Lambert tweet mediaNathan Lambert tweet media
English
142
42
1.8K
151.7K
Binfeng Xu 리트윗함
NVIDIA AI
NVIDIA AI@NVIDIAAI·
Nemotron 3 Ultra is coming this week. ⌛️
English
105
352
3.3K
389.5K
Benjamin Glickenhaus
Benjamin Glickenhaus@benglickenhaus·
do people have a flow they like for rubber ducking with an agent? even fast mode is too slow to stay in the flow
English
1
0
4
1.9K