Xiangyan Liu

49 posts

Xiangyan Liu

Xiangyan Liu

@dobogiyy

🇸🇬 PhD Student @ NUS | Multimodality, Agents

Singapore Katılım Ağustos 2022
159 Takip Edilen40 Takipçiler
Xiangyan Liu retweetledi
Fanqing Meng
Fanqing Meng@FanqingMengAI·
Text agents have their Gym. Vision agents? Not until now. Introducing Gym-V — a unified gym-style platform for agentic vision research, with 179 procedurally generated environments across 10 domains. One API to rule them all: 📦 Offline dataset 🤖 Agentic RL training 🔧 Tool-use training 👥 Multi-agent training 📊 VLM & T2I model evaluation All under the same reset/step interface. Key findings: 1. Observation scaffolding matters MORE than RL algorithm choice 2. Broad curricula transfer well; narrow training causes negative transfer 3. Multi-turn interaction amplifies everything 📄 Paper: arxiv.org/abs/2603.15432 💻 Code: github.com/ModalMinds/gym… Open the thread for a deep dive! 🧵
Fanqing Meng tweet media
English
8
17
109
9.3K
Xinlei
Xinlei@XinleiWang220·
@EvanCrypto17 @ResearchWang 不太明白什么老拿claude code和openclaw一块相提并论,这俩就完全是俩东西啊,用openclaw又绝大部分不是用它编程
中文
1
0
1
202
小塞
小塞@EvanCrypto17·
与港中文读 AI PHD,在各个 AI 模型工作过的好朋友聊天收获: 1、OpenClaw 是垃圾 2、北邮大四学生用 Claude 写了 2 周的项目卖了 3000 万美元,自己也想创业,但缺想法,窗口期就到今年 3、Claude 已经具备自己训练自己的能力,其他大模型也基本拥有自训能力,只是大家对外不说 4、视频 AI 在未来两年还会有质的飞跃,剪辑 AI 比预想的难很多 5、特别焦虑自己失业
中文
82
44
765
213.8K
Xiangyan Liu retweetledi
Fanqing Meng
Fanqing Meng@FanqingMengAI·
I am so confused that some says research and engineer separately To be a Good Engineer , Then learn to become Researcher
English
1
1
15
4.1K
Xiangyan Liu retweetledi
Yacine Mahdid
Yacine Mahdid@yacinelearning·
finally got done editing this awesome interview with @zzlccc first author of the Dr. GRPO paper in it we discuss: - llm post-training weirdness - is self reflexion even real - the absolute state of GRPO - simplicity in algorithmic design highly recommend for my RL-heads!
Yacine Mahdid tweet media
English
14
78
691
42.9K
Xiangyan Liu retweetledi
Renjie
Renjie@Renjie_Ranger·
🔥Congrats to the SDPO authors @jonashuebotter @FrederikeLubeck — really enjoyed the paper, and I appreciate the discussion + citation of our work “Language Models Can Learn from Verbal Feedback Without Scalar Rewards.” 🔍Complementary angle: SDPO uses feedback-conditioned self-teacher for on-policy distillation → dense credit assignment (feedback-as-state). We study Feedback-Conditional Policy (FCP): learn directly from (response, verbal feedback) pairs via MLE (feedback-as-goal) — super scalable and competitive with GRPO! 🚀 💡Our Motivation: Language priors are compositional: Text-to-Image models can generate rare concepts like “a banana surfing on the ocean” 🏄‍♂️📷 because language priors let them combine and compose elements from mixed prompts seen during training. 📑 Paper: arxiv.org/pdf/2509.22638 💻 Code: github.com/sail-sg/feedba…
Renjie tweet media
English
4
8
30
8.1K
Xiangyan Liu
Xiangyan Liu@dobogiyy·
nice idea and design
LobeHub@lobehub

Introducing LobeHub: Agent teammates that grow with you. LobeHub is the ultimate space for work and life: to find, build, and collaborate with agent teammates that grow with you. We’re building the world’s first and largest human–agent co-evolving network. Two years ago, we built LobeChat, an open-source interface for using different AI models. Today, LobeChat has 70k+ GitHub stars and serves 6M+ users worldwide. How to fully unlock the power of models has always been a shared mission between us and the community. We started with interaction — a fundamentally new, agent-first experience. Agents are no longer passive tools invoked in a single conversation. They should be proactive, always-on units of work. Treating agents as the minimal atomic unit is also the core of our agent harness infra. Today’s agents are mostly one-off executors. Even with memory, it’s often global — and hallucinates. We build long-term agent teammates that evolve with users. Each agent has its own dedicated memory space, editable by users, allowing humans and agents to co-evolve over time. This, in turn, allows us to design clearer rewards for reinforcement learning and create cleaner environments for continual learning. Agent teammates can work in groups. Through a multi-agent system, agent groups operate faster, more cost-effective, and go beyond what single-agent systems can achieve. For example, a single agent often requires heavy user involvement to proceed step by step, whereas LobeHub can execute the same work from a single instruction, with a supervisor orchestrating agents that run in parallel or debate to produce better results. We are building the collaboration network among agent teammates — and between humans and agent teammates as well. Ease of use matters. AI intelligence and shared human intelligence are equally important. With simple instructions and tool selection, you can effortlessly build and team up with agent coworkers to deliver complex, systematic work — even assembling a quant team to execute trades. Through the LobeHub community, anyone can discover, reuse, and remix agents and agent groups, customizing them to fit their own workflows, preferences, and needs. Last but not least, our vision started with LobeChat: multi-model support is the most efficient approach for users. We believe different models excel in different scenarios. By routing across multiple models, LobeHub improves cost efficiency and unlocks capabilities that a single-model setup cannot easily support.

English
0
0
1
129
Xiangyan Liu retweetledi
LobeHub
LobeHub@lobehub·
Introducing LobeHub: Agent teammates that grow with you. LobeHub is the ultimate space for work and life: to find, build, and collaborate with agent teammates that grow with you. We’re building the world’s first and largest human–agent co-evolving network. Two years ago, we built LobeChat, an open-source interface for using different AI models. Today, LobeChat has 70k+ GitHub stars and serves 6M+ users worldwide. How to fully unlock the power of models has always been a shared mission between us and the community. We started with interaction — a fundamentally new, agent-first experience. Agents are no longer passive tools invoked in a single conversation. They should be proactive, always-on units of work. Treating agents as the minimal atomic unit is also the core of our agent harness infra. Today’s agents are mostly one-off executors. Even with memory, it’s often global — and hallucinates. We build long-term agent teammates that evolve with users. Each agent has its own dedicated memory space, editable by users, allowing humans and agents to co-evolve over time. This, in turn, allows us to design clearer rewards for reinforcement learning and create cleaner environments for continual learning. Agent teammates can work in groups. Through a multi-agent system, agent groups operate faster, more cost-effective, and go beyond what single-agent systems can achieve. For example, a single agent often requires heavy user involvement to proceed step by step, whereas LobeHub can execute the same work from a single instruction, with a supervisor orchestrating agents that run in parallel or debate to produce better results. We are building the collaboration network among agent teammates — and between humans and agent teammates as well. Ease of use matters. AI intelligence and shared human intelligence are equally important. With simple instructions and tool selection, you can effortlessly build and team up with agent coworkers to deliver complex, systematic work — even assembling a quant team to execute trades. Through the LobeHub community, anyone can discover, reuse, and remix agents and agent groups, customizing them to fit their own workflows, preferences, and needs. Last but not least, our vision started with LobeChat: multi-model support is the most efficient approach for users. We believe different models excel in different scenarios. By routing across multiple models, LobeHub improves cost efficiency and unlocks capabilities that a single-model setup cannot easily support.
English
82
69
322
183.8K
Xiangyan Liu retweetledi
Longxu Dou
Longxu Dou@LongxuDou·
🚀We propose Reptile, a Terminal Agent🤖️that enables interaction with an LLM agent directly in your terminal. The agent can execute any command or custom CLI tool to accomplish tasks, and users can define their own tools and commands for the agent to utilize. ✨What Makes Reptile Special? Compared with other CLI agents (e.g., Claude Code and Mini SWE-Agent), Reptile stands out for the following reasons: ⚡️Human-in-the-Loop Learning: Users can inspect every step and provide prompt feedback, i.e., give feedback under the USER role or edit the LLM generation under the ASSISTANT role. The interaction will be used for model SFT training & RL training. 💻Terminal-only beyond Bash-only: Simple and stateful execution, which is more efficient than bash-only (you don’t need to specify the environment in every command). It doesn’t require the complicated MCP protocol—just a naive bash tool under the REPL protocol. Github: github.com/terminal-agent… Homepage: terminal-agent.github.io/blog/workflow/
Longxu Dou tweet media
English
4
17
22
2.6K
Xiangyan Liu retweetledi
Fanqing Meng
Fanqing Meng@FanqingMengAI·
I think it is not new.... In dpsk 3.2, they use expert rl -> joint sft -> joint rl in it. In longcat, they use expert rl -> model mr -> joint rl in it. mimo replace sft with opd. everyone know opd is better than sft :)
English
4
4
134
72K
Xiangyan Liu retweetledi
Xiang Yue
Xiang Yue@xiangyue96·
There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧵
Xiang Yue tweet media
English
28
239
1.4K
325.4K
Xiangyan Liu
Xiangyan Liu@dobogiyy·
time to embrace DLMs🤓
Jinjie Ni@NiJinjie

1/3 🚬 Ready to smell your GPUs burning? Introducing MegaDLMs, the first production-level library for training diffusion language models, offering 3× faster training speed and up to 47% MFU. Empowered by Megatron-LM and Transformer-Engine, it offers near-perfect linear scaling. github.com/JinjieNi/MegaD… You can train arbitrarily large models without compromising throughput. It was also the training backend for Super Data Learners, Quokka, and OpenMoE 2. We open-sourced more good stuff; see the next thread👇

English
0
0
1
148
Xiangyan Liu retweetledi
Jiawei Gu
Jiawei Gu@Kuvvius·
🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 arxiv.org/abs/2510.27492 (1/16)
Jiawei Gu tweet media
English
27
64
315
68.7K
Xiangyan Liu retweetledi
Zichen Liu
Zichen Liu@zzlccc·
Nothing feels more exciting than writing a thesis proposal on RL for LLMs before 2025 ends!! Covering a subset of my first-author works done in the past 1.5 years (after switching from traditional RL to LLM RL…) Tentative title, of course
Zichen Liu tweet media
English
16
62
513
60.7K
Xiangyan Liu retweetledi
Brian Li
Brian Li@Brian_Bo_Li·
Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full support for parallelism and optimization. Not just to hack⚙️, but also to scale🚀. So finally we built it for ourselves. github.com/EvolvingLMMs-L… LMMs-Engine: a lean, efficient framework built to train unified multimodal model at scale. From Qwen LLM, VLM, LLaVA-OV, and WanVideo, to unified models like Qwen-Omni and BAGEL — plus Linear-Attn GDN and research prototypes like RAE and SiT - all under one modular system that seamlessly integrates diverse datasets and optimization strategies. Powered by FSDP2 multi-dim parallelism, Ulysses sequence parallel, Flash-Attention, Liger Kernels, and Native Sparse Attention (also with bonus support for the Muon optimizer for all models).
English
9
32
111
54.6K
Xiangyan Liu retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I very much hope you continue working on RL! I think it's a misunderstanding that I am suggesting we need some kind of a replacement for RL. That's not accurate and I tried to clear it but did so poorly - they layer. Layer 1 was base model autocomplete. Layer 2 was instruct finetuning (SFT), creating assistants in style (InstructGPT paper). Layer 3 is reinforcement learning (RL), allowing us to essentially optimize over the sampling loop too, and driving away undesirable behaviors like hallucinations, stuck repetition loops, and eliciting "move 37"-like behaviors that would be really hard to SFT into the model, e.g. reasoning. I think that each of these layers will stick around as a stage in the final solution, but I am suggesting that we need additional layers and ideas 4, 5, 6, etc. The final AGI recipe includes a reinforcement learning stage. Just as humans utilize reinforcement learning for all kinds of behaviors, as a powerful tool in the toolbox.
English
59
141
2.9K
262.4K
Xiangyan Liu retweetledi
alex zhang
alex zhang@a1zhang·
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length, as a REPL environment. On the OOLONG benchmark, RLMs with GPT-5-mini outperforms GPT-5 by over 110% gains (more than double!) on 132k-token sequences and is cheaper to query on average. On the BrowseComp-Plus benchmark, RLMs with GPT-5 can take in 10M+ tokens as their “prompt” and answer highly compositional queries without degradation and even better than explicit indexing/retrieval. We link our blogpost, (still very early!) experiments, and discussion below.
alex zhang tweet media
English
135
377
2.7K
947.5K
Xiangyan Liu retweetledi
Min Lin
Min Lin@mavenlin·
LLMs don't need MCPs, they need a terminal. Not the bash/shell tool that the codex/claude are already using, but a real tty emulator, to be used in the same way that humans do, i.e. capable of running any REPL interactively, as we will show in the thread.
English
7
15
48
9.6K
Xiangyan Liu retweetledi
Fengzhuo Zhang
Fengzhuo Zhang@FengzhuoZhang·
Why does Muon outperform Adam—and how? 🚀Answer: Muon Outperforms Adam in Tail-End Associative Memory Learning Three Key Findings: > Associative memory parameters are the main beneficiaries of Muon, compared to Adam. > Muon yields more isotropic weights than Adam. > In heavy-tailed tasks, Muon significantly improves tail-class learning compared to Adam. Paper Link: arxiv.org/pdf/2509.26030 A thread 🧵
Fengzhuo Zhang tweet media
English
1
29
56
7.2K