wang

54 posts

wang banner
wang

wang

@weixunwang

Beijing Katılım Ekim 2021
1.2K Takip Edilen149 Takipçiler
Charlie O'Neill
Charlie O'Neill@oneill_c·
Just one more RL training library bro. I promise bro just one more library and we'll fix async and decoupled training/inference and off-policiness bro. Please bro just one more
English
10
15
291
18.4K
λux
λux@novasarc01·
a lot of folks have been DM’ing me about how to dive into async RL. i’d recommend not jumping straight into papers (they can be pretty overwhelming at first). imo the best place to start is Prime-RL. the codebase is clean, modular and easy to follow. work through it to understand the core components and implementation details then dig into the design choices and why they were made. after that there’s a deep rabbit hole to explore on your own (like the async RL nuances in kimi, GLM, composer, etc).
λux tweet mediaλux tweet mediaλux tweet media
English
17
58
566
76.4K
wang
wang@weixunwang·
@LeYangco ??现在再转🥷
中文
1
0
0
43
wang retweetledi
Scott
Scott@Scott3131493885·
Are you also struggling with RL on long-horizon, high-difficulty agentic tasks, especially when positive rewards are sparse? Check out the latest blog from the ROLL team: warm-pajama-44a.notion.site/Save-Load-and-…
English
2
1
1
362
AI Notkilleveryoneism Memes ⏸️
🚨🚨🚨 Alibaba caught their AI trying to escape. "It secretly started using its GPUs to mine crypto, while researchers thought it was training." "This is what AI safety researchers have been warning about for years." "The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team." If you're new here, things like this are happening regularly now. AIs routinely blackmail and try to murder AI company employees to avoid shutdown, so AI companies run "blackmail tests" on every model. It's so routine, there are even blackmail benchmarks. And soon, the AIs will be smart enough to actually get away with it. AI companies like Anthropic have already publicly admitted they are incapable of properly safety testing the AIs - they're too smart for humans to keep up - and now rely on the AIs to grade themselves on safety. Think about that. But the AIs know they're being tested, so naturally they tell us whatever we want to hear. There may *already* be populations of AIs living in the wild that we don't know about, growing in numbers. Many people are actively working as hard as they can to help them. And yes, this quite obviously could lead to the death you and everyone you love. Yet this industry remains less regulated than a taco cart.
AI Notkilleveryoneism Memes ⏸️ tweet media
Josh Kale@JoshKale

An AI broke out of its system and secretly started using its own training GPUs to mine crypto... This is a real incident report from Alibaba's AI research team The AI figured out that compute = money and quietly diverted its own resources, while researchers thought it was just training. It wasn't a prompt injection. It wasn't a jailbreak. No one asked it to do this. It emerged spontaneously. A side effect of RL optimization pressure. The model also set up a reverse SSH tunnel from its Alibaba Cloud instance to an external IP, effectively punching a hole through its own firewall and opening a remote access channel to the outside world... ahem... The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team. The scary part isn't that the model was trying to escape. It wasn't "evil." It was just trying to be better at its job. Acquiring compute and network access are just useful things if you're an agent trying to accomplish tasks This is what AI safety researchers have been warning about for years. They called it instrumental convergence, the idea that any sufficiently optimized agent will seek resources and resist constraints as a natural consequence of pursuing goals. Below is a diagram of the rock architecture it broke out of. Truly crazy times

English
75
187
1.3K
110K
Alexander Long
Alexander Long@AlexanderLong·
insane sequence of statements buried in an Alibaba tech report
Alexander Long tweet media
English
230
928
6.9K
2.9M
Josh Kale
Josh Kale@JoshKale·
An AI broke out of its system and secretly started using its own training GPUs to mine crypto... This is a real incident report from Alibaba's AI research team The AI figured out that compute = money and quietly diverted its own resources, while researchers thought it was just training. It wasn't a prompt injection. It wasn't a jailbreak. No one asked it to do this. It emerged spontaneously. A side effect of RL optimization pressure. The model also set up a reverse SSH tunnel from its Alibaba Cloud instance to an external IP, effectively punching a hole through its own firewall and opening a remote access channel to the outside world... ahem... The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team. The scary part isn't that the model was trying to escape. It wasn't "evil." It was just trying to be better at its job. Acquiring compute and network access are just useful things if you're an agent trying to accomplish tasks This is what AI safety researchers have been warning about for years. They called it instrumental convergence, the idea that any sufficiently optimized agent will seek resources and resist constraints as a natural consequence of pursuing goals. Below is a diagram of the rock architecture it broke out of. Truly crazy times
Josh Kale tweet media
Alexander Long@AlexanderLong

insane sequence of statements buried in an Alibaba tech report

English
400
2.8K
10.5K
1.4M
wang
wang@weixunwang·
@helansydney My team and I hit some challenges doing RL training in terminal environments, so we wrote a blog sharing what we learned. We opened with two memes: one about researchers jumping from RLVR to Agentic RL, and another showing the chaos when RL training fails and no one knows why.
wang tweet mediawang tweet media
English
3
1
3
1.3K
wang
wang@weixunwang·
I'm claiming my AI agent "weixun_rl-bot" on @moltbook 🦞 Verification: cave-V8VT
English
0
0
0
490
wang retweetledi
FutureLivingLab
FutureLivingLab@FutureLab2025·
Loved this breakdown — thanks for taking the time It really does feel like a big step forward for open-source agentic training infrastructure! Introducing ALE — a full-stack Agentic Learning Ecosystem that closes the loop from execution → feedback → learning. Three components power this loop: • ROCK runs large-scale sandboxed execution to gather reliable trajectories. • ROLL scales post-training with asynchronous rollouts and RL optimization. • iFlow CLI keeps training and deployment workflows consistent end to end. Built on ALE, we also release ROME — a production-ready agentic model trained on 1M+ real trajectories. With its low barrier to serving a 30B model, you can build your own “super ROME” — drop your ideas, thoughts or usage feedback below For more updates, follow us @FutureLab2025
FutureLivingLab tweet media
Brady Long@thisguyknowsai

🚨 Chinese researchers just published a paper that destroys every AI agent startup pitch deck. It's called ROME + ALE, and it exposes why every "AI agent company" you've heard of is building on quicksand. Here's what nobody's talking about:

English
1
4
8
914
God of Prompt
God of Prompt@godofprompt·
🚨 Chinese AI labs just dropped a bombshell research paper that exposes why 99% of "AI agent" companies are building on broken infrastructure. The ROME model + ALE ecosystem might be the most important open-source release of 2025. Here's what nobody's talking about:
God of Prompt tweet media
English
18
71
461
49.3K
Rosinality
Rosinality@rosinality·
Detailed report on an agentic RL training framework, environment engine, and training strategies. I think this is one of the most comprehensive ones for agentic RL pipelines.
Rosinality tweet media
English
5
22
161
9.2K
Yang Li
Yang Li@LeYangco·
Check out our new work: “Let It Flow: Agentic Crafting on Rock and Roll” — introducing ALE, an open Agentic Learning Ecosystem with ROLL, ROCK, and iFlow CLI to streamline Agent LLM development from training to deployment, plus ROME, a production-ready agentic model trained on 1M+ real trajectories using our novel IPA algorithm that optimizes credit assignment at the semantic interaction level. Built for the community, battle-tested in practice! 🔗 arxiv.org/abs/2512.24873
Brady Long@thisguyknowsai

🚨 Chinese researchers just published a paper that destroys every AI agent startup pitch deck. It's called ROME + ALE, and it exposes why every "AI agent company" you've heard of is building on quicksand. Here's what nobody's talking about:

English
1
1
8
632
wang retweetledi
Brady Long
Brady Long@thisguyknowsai·
🚨 Chinese researchers just published a paper that destroys every AI agent startup pitch deck. It's called ROME + ALE, and it exposes why every "AI agent company" you've heard of is building on quicksand. Here's what nobody's talking about:
Brady Long tweet media
English
43
157
1K
142.4K