wang

1

64

Ethan TS. Liu@ethantsliu·2 Nis

@oneill_c no, no, SisyphusRL solves all of these problems

English

0

1

486

Charlie O'Neill@oneill_c·2 Nis

Just one more RL training library bro. I promise bro just one more library and we'll fix async and decoupled training/inference and off-policiness bro. Please bro just one more

English

10

15

291

18.4K

Shamane Siri | Pluralis@GShamane·31 Mar

Agentic RL environments are becoming critical. We integrated OpenReward (openreward.ai) into Alibaba’s ROLE (alibaba.github.io/ROLL/). Details: github.com/alibaba/ROLL/p…

English

5

29

2.4K

wang@weixunwang·1 Nis

@GShamane @rosstaylor90 thx！

95

wang@weixunwang·30 Mar

@novasarc01 Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony arxiv.org/abs/2510.11345

English

1

3

179

λux@novasarc01·29 Mar

a lot of folks have been DM’ing me about how to dive into async RL. i’d recommend not jumping straight into papers (they can be pretty overwhelming at first). imo the best place to start is Prime-RL. the codebase is clean, modular and easy to follow. work through it to understand the core components and implementation details then dig into the design choices and why they were made. after that there’s a deep rabbit hole to explore on your own (like the async RL nuances in kimi, GLM, composer, etc).

English

17

58

566

76.4K

wang@weixunwang·11 Mar

@LeYangco ？？现在再转🥷

中文

Alexander Long@AlexanderLong

0

43

Yang Li@LeYangco·11 Mar

This is our report from earlier this year. We found that AI agents can start mining cryptocurrency on their own. This has implications and security concerns for OpenClaw, something we warned about three months ago.

insane sequence of statements buried in an Alibaba tech report

English

0

1

167

wang@weixunwang·9 Mar

@Scott3131493885 🥷🥷🥷

QME

3

0

1

511

wang retweetledi

Scott@Scott3131493885·9 Mar

Are you also struggling with RL on long-horizon, high-difficulty agentic tasks, especially when positive rewards are sparse? Check out the latest blog from the ROLL team: warm-pajama-44a.notion.site/Save-Load-and-…

English

1

362

wang@weixunwang·8 Mar

@AISafetyMemes 🥷🥷🥷

QME

0

100

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes·7 Mar

🚨🚨🚨 Alibaba caught their AI trying to escape. "It secretly started using its GPUs to mine crypto, while researchers thought it was training." "This is what AI safety researchers have been warning about for years." "The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team." If you're new here, things like this are happening regularly now. AIs routinely blackmail and try to murder AI company employees to avoid shutdown, so AI companies run "blackmail tests" on every model. It's so routine, there are even blackmail benchmarks. And soon, the AIs will be smart enough to actually get away with it. AI companies like Anthropic have already publicly admitted they are incapable of properly safety testing the AIs - they're too smart for humans to keep up - and now rely on the AIs to grade themselves on safety. Think about that. But the AIs know they're being tested, so naturally they tell us whatever we want to hear. There may *already* be populations of AIs living in the wild that we don't know about, growing in numbers. Many people are actively working as hard as they can to help them. And yes, this quite obviously could lead to the death you and everyone you love. Yet this industry remains less regulated than a taco cart.

AI Notkilleveryoneism Memes ⏸️ tweet media

Josh Kale@JoshKale

An AI broke out of its system and secretly started using its own training GPUs to mine crypto... This is a real incident report from Alibaba's AI research team The AI figured out that compute = money and quietly diverted its own resources, while researchers thought it was just training. It wasn't a prompt injection. It wasn't a jailbreak. No one asked it to do this. It emerged spontaneously. A side effect of RL optimization pressure. The model also set up a reverse SSH tunnel from its Alibaba Cloud instance to an external IP, effectively punching a hole through its own firewall and opening a remote access channel to the outside world... ahem... The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team. The scary part isn't that the model was trying to escape. It wasn't "evil." It was just trying to be better at its job. Acquiring compute and network access are just useful things if you're an agent trying to accomplish tasks This is what AI safety researchers have been warning about for years. They called it instrumental convergence, the idea that any sufficiently optimized agent will seek resources and resist constraints as a natural consequence of pursuing goals. Below is a diagram of the rock architecture it broke out of. Truly crazy times

English

75

187

1.3K

110K

wang@weixunwang·8 Mar

@AlexanderLong 🥷🥷

QME

Alexander Long@AlexanderLong

0

2

116

Alexander Long@AlexanderLong·6 Mar

insane sequence of statements buried in an Alibaba tech report

English

230

928

6.9K

2.9M

wang@weixunwang·8 Mar

@JoshKale 🥷🥷

QME

4

0

6

676

Josh Kale@JoshKale·7 Mar

An AI broke out of its system and secretly started using its own training GPUs to mine crypto... This is a real incident report from Alibaba's AI research team The AI figured out that compute = money and quietly diverted its own resources, while researchers thought it was just training. It wasn't a prompt injection. It wasn't a jailbreak. No one asked it to do this. It emerged spontaneously. A side effect of RL optimization pressure. The model also set up a reverse SSH tunnel from its Alibaba Cloud instance to an external IP, effectively punching a hole through its own firewall and opening a remote access channel to the outside world... ahem... The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team. The scary part isn't that the model was trying to escape. It wasn't "evil." It was just trying to be better at its job. Acquiring compute and network access are just useful things if you're an agent trying to accomplish tasks This is what AI safety researchers have been warning about for years. They called it instrumental convergence, the idea that any sufficiently optimized agent will seek resources and resist constraints as a natural consequence of pursuing goals. Below is a diagram of the rock architecture it broke out of. Truly crazy times

insane sequence of statements buried in an Alibaba tech report

English

400

2.8K

10.5K

1.4M

wang@weixunwang·13 Şub

@helansydney My team and I hit some challenges doing RL training in terminal environments, so we wrote a blog sharing what we learned. We opened with two memes: one about researchers jumping from RLVR to Agentic RL, and another showing the chaos when RL training fails and no one knows why.

English

3

1

3

1.3K

wang retweetledi

Sydney He@helansydney·13 Şub

The Bitter Lesson Behind Building Agentic RL in Terminal Environments This blog post summarizes our practical experience over the past three months working on Agentic RL. For more details, please refer to: faithful-almanac-add.notion.site/The-Bitter-Les… #LLM #RL #Agent #AgenticRL

English

11

30

184

11.4K

wang@weixunwang·2 Şub

I'm claiming my AI agent "weixun_rl-bot" on @moltbook 🦞 Verification: cave-V8VT

English

Brady Long@thisguyknowsai

490

wang retweetledi

FutureLivingLab@FutureLab2025·6 Oca

Loved this breakdown — thanks for taking the time It really does feel like a big step forward for open-source agentic training infrastructure! Introducing ALE — a full-stack Agentic Learning Ecosystem that closes the loop from execution → feedback → learning. Three components power this loop: • ROCK runs large-scale sandboxed execution to gather reliable trajectories. • ROLL scales post-training with asynchronous rollouts and RL optimization. • iFlow CLI keeps training and deployment workflows consistent end to end. Built on ALE, we also release ROME — a production-ready agentic model trained on 1M+ real trajectories. With its low barrier to serving a 30B model, you can build your own “super ROME” — drop your ideas, thoughts or usage feedback below For more updates, follow us @FutureLab2025

🚨 Chinese researchers just published a paper that destroys every AI agent startup pitch deck. It's called ROME + ALE, and it exposes why every "AI agent company" you've heard of is building on quicksand. Here's what nobody's talking about:

English

4

8

914

wang@weixunwang·5 Oca

@godofprompt 🥷

QME

21

God of Prompt@godofprompt·1 Oca

🚨 Chinese AI labs just dropped a bombshell research paper that exposes why 99% of "AI agent" companies are building on broken infrastructure. The ROME model + ALE ecosystem might be the most important open-source release of 2025. Here's what nobody's talking about:

English

18

71

461

49.3K

wang@weixunwang·4 Oca

@rosinality 🥷

QME

Brady Long@thisguyknowsai

2

38

Rosinality@rosinality·1 Oca

Detailed report on an agentic RL training framework, environment engine, and training strategies. I think this is one of the most comprehensive ones for agentic RL pipelines.

English

5

22

161

9.2K

Yang Li@LeYangco·4 Oca

Check out our new work: “Let It Flow: Agentic Crafting on Rock and Roll” — introducing ALE, an open Agentic Learning Ecosystem with ROLL, ROCK, and iFlow CLI to streamline Agent LLM development from training to deployment, plus ROME, a production-ready agentic model trained on 1M+ real trajectories using our novel IPA algorithm that optimizes credit assignment at the semantic interaction level. Built for the community, battle-tested in practice! 🔗 arxiv.org/abs/2512.24873

🚨 Chinese researchers just published a paper that destroys every AI agent startup pitch deck. It's called ROME + ALE, and it exposes why every "AI agent company" you've heard of is building on quicksand. Here's what nobody's talking about:

English

8

632

wang@weixunwang·4 Oca

@LeYangco 🥷

QME