Jack Bai

399 posts

Jack Bai banner
Jack Bai

Jack Bai

@jackbot_cs

CS PhD @UofIllinois, Research @MSFTResearch | Prev @Berkeley_ai. It's impossible to extract a good policy without a good value function.

Champaign, IL Katılım Ocak 2024
191 Takip Edilen1.1K Takipçiler
Sabitlenmiş Tweet
Jack Bai
Jack Bai@jackbot_cs·
😈 Today, we introduce WebGym, the largest-to-date open-source RL environment for web agent training that contains 300k tasks and a rollout framework optimized specifically for web environments' rollout speed. We reveal the effects of essential scaling directions we observe with WebGym. 1/n
English
13
36
378
42.9K
Jack Bai retweetledi
Peter Tong
Peter Tong@TongPetersb·
Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]
Peter Tong tweet media
English
34
222
1.1K
207.2K
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
738
13.6K
6.5M
Jack Bai
Jack Bai@jackbot_cs·
We're proud to share that WebGym is now accepted to CVPR 2026. I would be excited to talk to people working in the vision domain about web agents and reinforcement learning. See you in Denver soon. 😈 Code and data are now publicly available at github.com/microsoft/webg….
Jack Bai@jackbot_cs

😈 Today, we introduce WebGym, the largest-to-date open-source RL environment for web agent training that contains 300k tasks and a rollout framework optimized specifically for web environments' rollout speed. We reveal the effects of essential scaling directions we observe with WebGym. 1/n

English
0
4
21
3.3K
Toby Pohlen
Toby Pohlen@TobyPhln·
Three years, thousands of PRs, and a million jokes. Today was my last day @xai. To the team: you rock, no one burns the midnight oil better. To @elonmusk, thanks for taking me on board. I've learnt more about execution, speed, and product perfectionism than I could ever have imagined. Thanks for everything. My next priorities: sleep for more than 8h, write down all the things I've learnt (I have a list), and then think about what I want to do next. @gork wdyt?
English
341
163
5.1K
1.2M
Jack Bai retweetledi
Qianhui Wu
Qianhui Wu@5000hui·
We've released the full package for GUI-Libra! 🌟 📂 Data/Model: huggingface.co/GUI-Libra 📄 Paper: arxiv.org/abs/2602.22190 🌐 Project: gui-libra.github.io Happy to hear feedback from the community!
Rui Yang@RuiYang70669025

Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔 ✨Introducing GUI-Libra (gui-libra.github.io): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!). Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains. Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents. Check out our paper (arxiv.org/abs/2602.22190) and code/dataset/model (github.com/GUI-Libra/GUI-…) for more details. #GUI #agent #VLM

English
0
7
21
3.5K
Jack Bai retweetledi
Lunjun Zhang
Lunjun Zhang@LunjunZhang·
RL optimizes weights. Evolution optimizes contexts. What if we combine RL and Evolutionary Algorithm (EA) into a new paradigm of LLM self-improvement? In "Evolutionary System Prompt Learning for Reinforcement Learning in LLMs", we show that RL and EA are deeply synergistic.
Lunjun Zhang tweet media
English
9
42
300
15.5K
Jack Bai retweetledi
Rui Yang
Rui Yang@RuiYang70669025·
Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔 ✨Introducing GUI-Libra (gui-libra.github.io): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!). Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains. Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents. Check out our paper (arxiv.org/abs/2602.22190) and code/dataset/model (github.com/GUI-Libra/GUI-…) for more details. #GUI #agent #VLM
Rui Yang tweet mediaRui Yang tweet media
English
1
12
58
11.4K
Soumyadeep Bakshi @GTC 2026
Soumyadeep Bakshi @GTC 2026@soumyadeepb_·
@jackbot_cs The task monitor is a pretty cool idea, gamifies the run! I suppose you can click through for details of the individual rollout?
English
1
0
1
78
Jack Bai
Jack Bai@jackbot_cs·
😈 Today, Microsoft open-sources WebGym: the task set, code, a bunch of visualization tools, and guiding documentations. WebGym is an RL environment with the *first* open-source implementation of the fully asynchronous rollout system designed for multi-step vision-supported web agentic trajectory collection, which speeds up *4x-5x* compared to existing synchronous implementations. This release comes with *300k* realistic web agentic tasks with comprehensive evaluation rubrics and pipeline, together with annotations on difficulty and domains. 🧵 1/6
English
2
10
50
3.8K
Jack Bai
Jack Bai@jackbot_cs·
🧩Simplified. WebGym manages the RL loop with a run.sh script to separate the rollout and training scripts to two programs. *It avoids getting integrated with any existing RL framework* for you to easily decouple the rollout framework code to apply to your favourite RL framework (veRL, Slime, PipelineRL, AReaL, etc.). 🧵 5/6
Jack Bai tweet media
English
1
0
1
245
Andrew Akbashev
Andrew Akbashev@Andrew_Akbashev·
A really dangerous situation. Too many submissions. Too many generated papers. Little responsibility. 1. In 2026, more than 24,000 submissions were made to the International Conference on Machine Learning (ICML). It’s TWO times more than in 2025. To fight it, the organizers now require researchers to pay $100 for every subsequent paper. 2. LLM adoption has increased researcher productivity by 90% (there’s a recent paper in Science). 3. The number of papers is becoming far too high. Submissions to arXiv have risen by 50% since 2022. 4. There are simply not enough reviewers. Plus, many scientists no longer want to invest precious time in it for free. 5. We can’t easily identify AI-made papers from the genuine ones. __ Important words from Paul Ginsparg, a co-founder of arXiv: “AI slop frequently can’t be discriminated just by looking at abstract, or even by just skimming full text. This makes it an “existential threat” to the system.” Basically, we’re getting closer to the tipping point. 📍 Many professors blame the AI. But the problem is likely elsewhere: 1. Without a sufficient number of papers, many PIs can’t get funded. They have to prove their credibility to reviewers. Their proposals have to rely on prior publications. In many countries, there are some informal (or even formal) expectations for how many papers a group with a certain size has to publish to survive (funding-wise). 2. Our students / postdocs need papers if they want to be hired in faculty roles. Yes, some departments hire people with few publications. But the majority still want to ensure their faculty can get funded. If funding is partly a function of papers, this is used in decision-making. 3. The number of papers is important if you want to get high-level awards. Many of them are not given because you published one paper (even if it’s great). They are given because you made a meaningful CONTRIBUTION to the field. How do you make it? Publish more papers. 4. Tenure promotions in many places take the number of your papers into account (often indirectly). Your tenure may get delayed if you don’t publish enough. Not everywhere, but for many mid- to low-ranked universities this story is more or less the same. + There are many more to mention. 📍My opinion: Much of this is rooted in how funding is distributed. There is a strong correlation between the requirements at a university and the funding acquisition criteria. If funding were based ONLY on the quality of published papers, universities would hire people for the quality of their science. If funding agencies strongly discouraged publishing too many papers, universities wouldn’t expect numbers from faculty during promotions. And some supervisors wouldn’t pressure students and postdocs to publish unfinished studies and low-quality data. Yes, we need good detectors of fake papers. But we also need the right policies and better funding allocation criteria.
Andrew Akbashev tweet media
English
94
379
1.4K
193K
Jack Bai retweetledi
Chenlu Ye
Chenlu Ye@ye_chenlu·
1/5 Happy CNY🎊 Still bothered by RL off-policy instability in LLM? Introducing a new way💡Adaptive Layerwise Perturbation (ALP)💡, a simple but robust fix that outperforms GRPO/MIS/Bypass, achieves better stability (KL, entropy) and exploration! 🔗 Blog: beneficial-curiosity-d98.notion.site/Adaptive-Layer…
Chenlu Ye tweet mediaChenlu Ye tweet mediaChenlu Ye tweet media
English
4
27
145
23.8K