Xiangyan Liu (@dobogiyy) - Twitter Profili | Zamantika Mersobahis Locabet

Xiangyan Liu retweetledi

Fanqing Meng@FanqingMengAI·17 Mar

Text agents have their Gym. Vision agents? Not until now. Introducing Gym-V — a unified gym-style platform for agentic vision research, with 179 procedurally generated environments across 10 domains. One API to rule them all: 📦 Offline dataset 🤖 Agentic RL training 🔧 Tool-use training 👥 Multi-agent training 📊 VLM & T2I model evaluation All under the same reset/step interface. Key findings: 1. Observation scaffolding matters MORE than RL algorithm choice 2. Broad curricula transfer well; narrow training causes negative transfer 3. Multi-turn interaction amplifies everything 📄 Paper: arxiv.org/abs/2603.15432 💻 Code: github.com/ModalMinds/gym… Open the thread for a deep dive! 🧵

English

8

17

109

9.3K

Xiangyan Liu@dobogiyy·14 Mar

@XinleiWang220 @EvanCrypto17 @ResearchWang 因为从技术上说，openclaw就是claude code的子集啊

中文

1

0

50

Xinlei@XinleiWang220·14 Mar

@EvanCrypto17 @ResearchWang 不太明白什么老拿claude code和openclaw一块相提并论，这俩就完全是俩东西啊，用openclaw又绝大部分不是用它编程

中文

1

0

1

202

小塞@EvanCrypto17·14 Mar

与港中文读 AI PHD，在各个 AI 模型工作过的好朋友聊天收获： 1、OpenClaw 是垃圾 2、北邮大四学生用 Claude 写了 2 周的项目卖了 3000 万美元，自己也想创业，但缺想法，窗口期就到今年 3、Claude 已经具备自己训练自己的能力，其他大模型也基本拥有自训能力，只是大家对外不说 4、视频 AI 在未来两年还会有质的飞跃，剪辑 AI 比预想的难很多 5、特别焦虑自己失业

中文

82

44

765

213.8K

Xiangyan Liu retweetledi

Fanqing Meng@FanqingMengAI·5 Şub

I am so confused that some says research and engineer separately To be a Good Engineer , Then learn to become Researcher

English

1

15

4.1K

Xiangyan Liu retweetledi

Yacine Mahdid@yacinelearning·4 Şub

finally got done editing this awesome interview with @zzlccc first author of the Dr. GRPO paper in it we discuss: - llm post-training weirdness - is self reflexion even real - the absolute state of GRPO - simplicity in algorithmic design highly recommend for my RL-heads!

English

14

78

691

42.9K

Xiangyan Liu retweetledi

Renjie@Renjie_Ranger·31 Oca

🔥Congrats to the SDPO authors @jonashuebotter @FrederikeLubeck — really enjoyed the paper, and I appreciate the discussion + citation of our work “Language Models Can Learn from Verbal Feedback Without Scalar Rewards.” 🔍Complementary angle: SDPO uses feedback-conditioned self-teacher for on-policy distillation → dense credit assignment (feedback-as-state). We study Feedback-Conditional Policy (FCP): learn directly from (response, verbal feedback) pairs via MLE (feedback-as-goal) — super scalable and competitive with GRPO! 🚀 💡Our Motivation: Language priors are compositional: Text-to-Image models can generate rare concepts like “a banana surfing on the ocean” 🏄‍♂️📷 because language priors let them combine and compose elements from mixed prompts seen during training. 📑 Paper: arxiv.org/pdf/2509.22638 💻 Code: github.com/sail-sg/feedba…

English

4

8

30

8.1K

Xiangyan Liu@dobogiyy·27 Oca

nice idea and design

LobeHub@lobehub

Introducing LobeHub: Agent teammates that grow with you. LobeHub is the ultimate space for work and life: to find, build, and collaborate with agent teammates that grow with you. We’re building the world’s first and largest human–agent co-evolving network. Two years ago, we built LobeChat, an open-source interface for using different AI models. Today, LobeChat has 70k+ GitHub stars and serves 6M+ users worldwide. How to fully unlock the power of models has always been a shared mission between us and the community. We started with interaction — a fundamentally new, agent-first experience. Agents are no longer passive tools invoked in a single conversation. They should be proactive, always-on units of work. Treating agents as the minimal atomic unit is also the core of our agent harness infra. Today’s agents are mostly one-off executors. Even with memory, it’s often global — and hallucinates. We build long-term agent teammates that evolve with users. Each agent has its own dedicated memory space, editable by users, allowing humans and agents to co-evolve over time. This, in turn, allows us to design clearer rewards for reinforcement learning and create cleaner environments for continual learning. Agent teammates can work in groups. Through a multi-agent system, agent groups operate faster, more cost-effective, and go beyond what single-agent systems can achieve. For example, a single agent often requires heavy user involvement to proceed step by step, whereas LobeHub can execute the same work from a single instruction, with a supervisor orchestrating agents that run in parallel or debate to produce better results. We are building the collaboration network among agent teammates — and between humans and agent teammates as well. Ease of use matters. AI intelligence and shared human intelligence are equally important. With simple instructions and tool selection, you can effortlessly build and team up with agent coworkers to deliver complex, systematic work — even assembling a quant team to execute trades. Through the LobeHub community, anyone can discover, reuse, and remix agents and agent groups, customizing them to fit their own workflows, preferences, and needs. Last but not least, our vision started with LobeChat: multi-model support is the most efficient approach for users. We believe different models excel in different scenarios. By routing across multiple models, LobeHub improves cost efficiency and unlocks capabilities that a single-model setup cannot easily support.

English

0

1

129

Xiangyan Liu retweetledi

LobeHub@lobehub·27 Oca

Introducing LobeHub: Agent teammates that grow with you. LobeHub is the ultimate space for work and life: to find, build, and collaborate with agent teammates that grow with you. We’re building the world’s first and largest human–agent co-evolving network. Two years ago, we built LobeChat, an open-source interface for using different AI models. Today, LobeChat has 70k+ GitHub stars and serves 6M+ users worldwide. How to fully unlock the power of models has always been a shared mission between us and the community. We started with interaction — a fundamentally new, agent-first experience. Agents are no longer passive tools invoked in a single conversation. They should be proactive, always-on units of work. Treating agents as the minimal atomic unit is also the core of our agent harness infra. Today’s agents are mostly one-off executors. Even with memory, it’s often global — and hallucinates. We build long-term agent teammates that evolve with users. Each agent has its own dedicated memory space, editable by users, allowing humans and agents to co-evolve over time. This, in turn, allows us to design clearer rewards for reinforcement learning and create cleaner environments for continual learning. Agent teammates can work in groups. Through a multi-agent system, agent groups operate faster, more cost-effective, and go beyond what single-agent systems can achieve. For example, a single agent often requires heavy user involvement to proceed step by step, whereas LobeHub can execute the same work from a single instruction, with a supervisor orchestrating agents that run in parallel or debate to produce better results. We are building the collaboration network among agent teammates — and between humans and agent teammates as well. Ease of use matters. AI intelligence and shared human intelligence are equally important. With simple instructions and tool selection, you can effortlessly build and team up with agent coworkers to deliver complex, systematic work — even assembling a quant team to execute trades. Through the LobeHub community, anyone can discover, reuse, and remix agents and agent groups, customizing them to fit their own workflows, preferences, and needs. Last but not least, our vision started with LobeChat: multi-model support is the most efficient approach for users. We believe different models excel in different scenarios. By routing across multiple models, LobeHub improves cost efficiency and unlocks capabilities that a single-model setup cannot easily support.

English

82

69

322

183.8K

Xiangyan Liu retweetledi

Longxu Dou@LongxuDou·17 Ara

🚀We propose Reptile, a Terminal Agent🤖️that enables interaction with an LLM agent directly in your terminal. The agent can execute any command or custom CLI tool to accomplish tasks, and users can define their own tools and commands for the agent to utilize. ✨What Makes Reptile Special? Compared with other CLI agents (e.g., Claude Code and Mini SWE-Agent), Reptile stands out for the following reasons: ⚡️Human-in-the-Loop Learning: Users can inspect every step and provide prompt feedback, i.e., give feedback under the USER role or edit the LLM generation under the ASSISTANT role. The interaction will be used for model SFT training & RL training. 💻Terminal-only beyond Bash-only: Simple and stateful execution, which is more efficient than bash-only (you don’t need to specify the environment in every command). It doesn’t require the complicated MCP protocol—just a naive bash tool under the REPL protocol. Github: github.com/terminal-agent… Homepage: terminal-agent.github.io/blog/workflow/

English

4

17

22

2.6K

Xiangyan Liu@dobogiyy·17 Ara

@FanqingMengAI you enjoying?

English

0

6

1.8K

Xiangyan Liu retweetledi

Fanqing Meng@FanqingMengAI·16 Ara

I think it is not new.... In dpsk 3.2, they use expert rl -> joint sft -> joint rl in it. In longcat, they use expert rl -> model mr -> joint rl in it. mimo replace sft with opd. everyone know opd is better than sft :)

English

4

134

72K

Xiangyan Liu retweetledi

Xiang Yue@xiangyue96·9 Ara

There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧵

English

28

239

1.4K

325.4K

Xiangyan Liu@dobogiyy·6 Kas

time to embrace DLMs🤓

Jinjie Ni@NiJinjie

1/3 🚬 Ready to smell your GPUs burning? Introducing MegaDLMs, the first production-level library for training diffusion language models, offering 3× faster training speed and up to 47% MFU. Empowered by Megatron-LM and Transformer-Engine, it offers near-perfect linear scaling. github.com/JinjieNi/MegaD… You can train arbitrarily large models without compromising throughput. It was also the training backend for Super Data Learners, Quokka, and OpenMoE 2. We open-sourced more good stuff; see the next thread👇

English

0

1

148

Xiangyan Liu retweetledi

Jiawei Gu@Kuvvius·3 Kas

🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 arxiv.org/abs/2510.27492 (1/16)

English

27

64

315

68.7K

Xiangyan Liu retweetledi

Zichen Liu@zzlccc·25 Eki

Nothing feels more exciting than writing a thesis proposal on RL for LLMs before 2025 ends!! Covering a subset of my first-author works done in the past 1.5 years (after switching from traditional RL to LLM RL…) Tentative title, of course

English

16

62

513

60.7K

Xiangyan Liu retweetledi

Brian Li@Brian_Bo_Li·25 Eki

Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full support for parallelism and optimization. Not just to hack⚙️, but also to scale🚀. So finally we built it for ourselves. github.com/EvolvingLMMs-L… LMMs-Engine: a lean, efficient framework built to train unified multimodal model at scale. From Qwen LLM, VLM, LLaVA-OV, and WanVideo, to unified models like Qwen-Omni and BAGEL — plus Linear-Attn GDN and research prototypes like RAE and SiT - all under one modular system that seamlessly integrates diverse datasets and optimization strategies. Powered by FSDP2 multi-dim parallelism, Ulysses sequence parallel, Flash-Attention, Liger Kernels, and Native Sparse Attention (also with bonus support for the Muon optimizer for all models).

English

9

32

111

54.6K

Xiangyan Liu retweetledi

Andrej Karpathy@karpathy·19 Eki

I very much hope you continue working on RL! I think it's a misunderstanding that I am suggesting we need some kind of a replacement for RL. That's not accurate and I tried to clear it but did so poorly - they layer. Layer 1 was base model autocomplete. Layer 2 was instruct finetuning (SFT), creating assistants in style (InstructGPT paper). Layer 3 is reinforcement learning (RL), allowing us to essentially optimize over the sampling loop too, and driving away undesirable behaviors like hallucinations, stuck repetition loops, and eliciting "move 37"-like behaviors that would be really hard to SFT into the model, e.g. reasoning. I think that each of these layers will stick around as a stage in the final solution, but I am suggesting that we need additional layers and ideas 4, 5, 6, etc. The final AGI recipe includes a reinforcement learning stage. Just as humans utilize reinforcement learning for all kinds of behaviors, as a powerful tool in the toolbox.

English

59

141

2.9K

262.4K

Xiangyan Liu retweetledi

alex zhang@a1zhang·15 Eki

What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length, as a REPL environment. On the OOLONG benchmark, RLMs with GPT-5-mini outperforms GPT-5 by over 110% gains (more than double!) on 132k-token sequences and is cheaper to query on average. On the BrowseComp-Plus benchmark, RLMs with GPT-5 can take in 10M+ tokens as their “prompt” and answer highly compositional queries without degradation and even better than explicit indexing/retrieval. We link our blogpost, (still very early!) experiments, and discussion below.

English

135

377

2.7K

947.5K

Xiangyan Liu retweetledi

Min Lin@mavenlin·13 Eki

LLMs don't need MCPs, they need a terminal. Not the bash/shell tool that the codex/claude are already using, but a real tty emulator, to be used in the same way that humans do, i.e. capable of running any REPL interactively, as we will show in the thread.

English

7

15

48

9.6K

Xiangyan Liu retweetledi

Fengzhuo Zhang@FengzhuoZhang·7 Eki

Why does Muon outperform Adam—and how? 🚀Answer: Muon Outperforms Adam in Tail-End Associative Memory Learning Three Key Findings: > Associative memory parameters are the main beneficiaries of Muon, compared to Adam. > Muon yields more isotropic weights than Adam. > In heavy-tailed tasks, Muon significantly improves tail-class learning compared to Adam. Paper Link: arxiv.org/pdf/2509.26030 A thread 🧵

English

1

29

56

7.2K

Xiangyan Liu

Keşfet