Jack Bai (@jackbot_cs) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Jack Bai@jackbot_cs·9 Oca

😈 Today, we introduce WebGym, the largest-to-date open-source RL environment for web agent training that contains 300k tasks and a rollout framework optimized specifically for web environments' rollout speed. We reveal the effects of essential scaling directions we observe with WebGym. 1/n

English

13

36

378

42.9K

Jack Bai@jackbot_cs·6 Mar

Flow-matching value functions

Aviral Kumar@aviral_kumar2

🚨🚨 New paper on flow-matching value functions Last year, we showed training RL value functions with a flow-matching loss achieved SOTA results. But why does it work? And what could it possibly tell us about other things that have nothing to do with VFs or even RL? Short answer: iterative compute used correctly can address feature plasticity in continual learning! 🧵⬇️

English

0

3

38

4.5K

Jack Bai retweetledi

Peter Tong@TongPetersb·4 Mar

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

English

34

222

1.1K

207.2K

Jack Bai@jackbot_cs·3 Mar

@jiayi_pirate @JustinLin610 @grok explain

English

1

0

2

1.4K

Jiayi Pan@jiayi_pirate·3 Mar

@JustinLin610 qwen is nothing without its people 🫡

English

3

185

39.5K

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

738

13.6K

6.5M

Jack Bai@jackbot_cs·2 Mar

We're proud to share that WebGym is now accepted to CVPR 2026. I would be excited to talk to people working in the vision domain about web agents and reinforcement learning. See you in Denver soon. 😈 Code and data are now publicly available at github.com/microsoft/webg….

Jack Bai@jackbot_cs

😈 Today, we introduce WebGym, the largest-to-date open-source RL environment for web agent training that contains 300k tasks and a rollout framework optimized specifically for web environments' rollout speed. We reveal the effects of essential scaling directions we observe with WebGym. 1/n

English

0

4

21

3.3K

Jack Bai@jackbot_cs·27 Şub

@simon_zhai @TobyPhln @xai @elonmusk @grok explain

English

1

0

71

Simon Zhai@simon_zhai·27 Şub

@TobyPhln @xai @elonmusk take some rest toby! now you finally have time for GTA6

English

1

0

4

1.6K

Toby Pohlen@TobyPhln·27 Şub

Three years, thousands of PRs, and a million jokes. Today was my last day @xai. To the team: you rock, no one burns the midnight oil better. To @elonmusk, thanks for taking me on board. I've learnt more about execution, speed, and product perfectionism than I could ever have imagined. Thanks for everything. My next priorities: sleep for more than 8h, write down all the things I've learnt (I have a list), and then think about what I want to do next. @gork wdyt?

English

341

163

5.1K

1.2M

Jack Bai retweetledi

Qianhui Wu@5000hui·26 Şub

We've released the full package for GUI-Libra! 🌟 📂 Data/Model: huggingface.co/GUI-Libra 📄 Paper: arxiv.org/abs/2602.22190 🌐 Project: gui-libra.github.io Happy to hear feedback from the community!

Rui Yang@RuiYang70669025

Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔 ✨Introducing GUI-Libra (gui-libra.github.io): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!). Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains. Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents. Check out our paper (arxiv.org/abs/2602.22190) and code/dataset/model (github.com/GUI-Libra/GUI-…) for more details. #GUI #agent #VLM

English

0

7

21

3.5K

Jack Bai retweetledi

Lunjun Zhang@LunjunZhang·26 Şub

RL optimizes weights. Evolution optimizes contexts. What if we combine RL and Evolutionary Algorithm (EA) into a new paradigm of LLM self-improvement? In "Evolutionary System Prompt Learning for Reinforcement Learning in LLMs", we show that RL and EA are deeply synergistic.

English

9

42

300

15.5K

Jack Bai retweetledi

Rui Yang@RuiYang70669025·26 Şub

Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔 ✨Introducing GUI-Libra (gui-libra.github.io): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!). Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains. Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents. Check out our paper (arxiv.org/abs/2602.22190) and code/dataset/model (github.com/GUI-Libra/GUI-…) for more details. #GUI #agent #VLM

English

1

12

58

11.4K

Jack Bai@jackbot_cs·26 Şub

@soumyadeepb_ Yes you can :)

English

0

1

68

Soumyadeep Bakshi @GTC 2026@soumyadeepb_·26 Şub

@jackbot_cs The task monitor is a pretty cool idea, gamifies the run! I suppose you can click through for details of the individual rollout?

English

1

0

1

78

Jack Bai@jackbot_cs·25 Şub

😈 Today, Microsoft open-sources WebGym: the task set, code, a bunch of visualization tools, and guiding documentations. WebGym is an RL environment with the *first* open-source implementation of the fully asynchronous rollout system designed for multi-step vision-supported web agentic trajectory collection, which speeds up *4x-5x* compared to existing synchronous implementations. This release comes with *300k* realistic web agentic tasks with comprehensive evaluation rubrics and pipeline, together with annotations on difficulty and domains. 🧵 1/6

English

2

10

50

3.8K

Jack Bai@jackbot_cs·25 Şub

📦Here is a list of all resources of this release. Check them out! Code: github.com/microsoft/webg… Task: huggingface.co/datasets/micro… Doc: webgym.readthedocs.io/en/latest/ DeepWiki: deepwiki.com Paper: arxiv.org/abs/2601.02439 🧵 6/6

English

0

4

209

Jack Bai@jackbot_cs·25 Şub

🧩Simplified. WebGym manages the RL loop with a run.sh script to separate the rollout and training scripts to two programs. *It avoids getting integrated with any existing RL framework* for you to easily decouple the rollout framework code to apply to your favourite RL framework (veRL, Slime, PipelineRL, AReaL, etc.). 🧵 5/6

English

1

0

1

245

Jack Bai@jackbot_cs·19 Şub

@Andrew_Akbashev Crowd sourcing in the new era. Sigh.

English

0

98

Andrew Akbashev@Andrew_Akbashev·18 Şub

A really dangerous situation. Too many submissions. Too many generated papers. Little responsibility. 1. In 2026, more than 24,000 submissions were made to the International Conference on Machine Learning (ICML). It’s TWO times more than in 2025. To fight it, the organizers now require researchers to pay $100 for every subsequent paper. 2. LLM adoption has increased researcher productivity by 90% (there’s a recent paper in Science). 3. The number of papers is becoming far too high. Submissions to arXiv have risen by 50% since 2022. 4. There are simply not enough reviewers. Plus, many scientists no longer want to invest precious time in it for free. 5. We can’t easily identify AI-made papers from the genuine ones. __ Important words from Paul Ginsparg, a co-founder of arXiv: “AI slop frequently can’t be discriminated just by looking at abstract, or even by just skimming full text. This makes it an “existential threat” to the system.” Basically, we’re getting closer to the tipping point. 📍 Many professors blame the AI. But the problem is likely elsewhere: 1. Without a sufficient number of papers, many PIs can’t get funded. They have to prove their credibility to reviewers. Their proposals have to rely on prior publications. In many countries, there are some informal (or even formal) expectations for how many papers a group with a certain size has to publish to survive (funding-wise). 2. Our students / postdocs need papers if they want to be hired in faculty roles. Yes, some departments hire people with few publications. But the majority still want to ensure their faculty can get funded. If funding is partly a function of papers, this is used in decision-making. 3. The number of papers is important if you want to get high-level awards. Many of them are not given because you published one paper (even if it’s great). They are given because you made a meaningful CONTRIBUTION to the field. How do you make it? Publish more papers. 4. Tenure promotions in many places take the number of your papers into account (often indirectly). Your tenure may get delayed if you don’t publish enough. Not everywhere, but for many mid- to low-ranked universities this story is more or less the same. + There are many more to mention. 📍My opinion: Much of this is rooted in how funding is distributed. There is a strong correlation between the requirements at a university and the funding acquisition criteria. If funding were based ONLY on the quality of published papers, universities would hire people for the quality of their science. If funding agencies strongly discouraged publishing too many papers, universities wouldn’t expect numbers from faculty during promotions. And some supervisors wouldn’t pressure students and postdocs to publish unfinished studies and low-quality data. Yes, we need good detectors of fake papers. But we also need the right policies and better funding allocation criteria.

English

94

379

1.4K

193K

Jack Bai retweetledi

Chenlu Ye@ye_chenlu·19 Şub

1/5 Happy CNY🎊 Still bothered by RL off-policy instability in LLM? Introducing a new way💡Adaptive Layerwise Perturbation (ALP)💡, a simple but robust fix that outperforms GRPO/MIS/Bypass, achieves better stability (KL, entropy) and exploration! 🔗 Blog: beneficial-curiosity-d98.notion.site/Adaptive-Layer…

English

4

27

145

23.8K

Jack Bai

Keşfet