Linyi Yang

117 posts

Linyi Yang

Linyi Yang

@linyi_yang

SUSTech

Shenzhen Katılım Haziran 2016
576 Takip Edilen495 Takipçiler
Linyi Yang retweetledi
Xidong Feng
Xidong Feng@Xidong_Feng·
We've witnessed a crazy concurrent line of work on on-policy self-distillation in LLMs, and I truly believe this is the next paradigm of RL. Back in 2024, we proposed this exact conceptual shift in our paper, Natural Language Reinforcement Learning (NLRL). The real breakthrough here isn't just the specific distillation mechanics. It’s that RL is fundamentally shifting away from the traditional "sample -> then filter or amplify" approach. Instead of passively waiting to stumble upon a good action to upweight, the field is moving toward true synthetic language data generation from experience, which enables true continual learning. You can see this exact recipe playing out across all the recent hit papers: • RLTF (2602.02482): Text critiques as privileged info • OPSD (2601.18734): Ground-truth solutions • SDPO (2601.20802): Runtime errors & execution feedback • ERL(2602.13949): Self-reflections & demonstrations Instead of just using a scalar reward to filter bad rollouts, they all use language feedback to explicitly generate a corrected, high-quality trajectory in hindsight, and then distill that competence back into the base policy. While the specific ways we adapt RL to LLMs are still rapidly evolving, the core vision we outlined in NLRL holds true today: a single scalar is simply too poor of a carrier for credit assignment. When people talk about "experiential memory" for agents today, they are essentially describing what we framed as a Language Value Function (LVF)—not just RAG over past episodes, but storing the structured, strategy-level "why" behind what worked. And what we called "Language Policy Improvement" is exactly this feedback-aware self-distillation loop we see everywhere now. Language, not scalars, is the future of RL. 📄 Check out our early exploration of this framework here: arxiv.org/abs/2411.14251
English
6
28
205
31.6K
Linyi Yang retweetledi
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
738
13.6K
6.5M
Linyi Yang retweetledi
DAIR.AI
DAIR.AI@dair_ai·
On building more powerful self-evolving agents. LLM agents struggle to learn from experience after deployment. Fine-tuning is expensive and causes catastrophic forgetting. RAG retrieves based on semantic similarity alone, often pulling noise instead of what actually works. Similarity and utility are not the same thing. This new research introduces MemRL, a framework that enables agents to self-evolve through non-parametric reinforcement learning on episodic memory, keeping the LLM completely frozen. The core idea is to treat memory retrieval as a decision-making problem, not a matching problem. Each memory stores an Intent-Experience-Utility triplet. The utility is a learned Q-value representing expected returns, continuously refined through environmental feedback. MemRL implements Two-Phase Retrieval. First, filter candidates by semantic similarity to ensure relevance. Then, rank by learned Q-values to select what actually works. This distinguishes high-value strategies from semantically similar noise. When the agent succeeds or fails, it updates the Q-values of retrieved memories using Bellman-style backups. No gradient updates to model weights. The frozen LLM provides stable reasoning while the memory evolves plastically. Results across four benchmarks: On HLE (knowledge frontier tasks), MemRL significantly outperforms both RAG and existing memory systems like MemP. The pattern holds on BigCodeBench for code generation, ALFWorld for exploration tasks, and Lifelong Agent Bench for OS and database operations. Analysis confirms a strong correlation between learned utility scores and actual task success, validating that Q-values capture genuine functional value rather than superficial similarity. Why does it matter? Decoupling stable reasoning from plastic memory enables continuous runtime improvement without the catastrophic forgetting or computational costs of fine-tuning. Paper: arxiv.org/abs/2601.03192 Learn to build effective AI agents in our academy: dair-ai.thinkific.com
DAIR.AI tweet media
English
15
48
281
27.4K
Linyi Yang retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
🧮 Google just published DS-STAR A data science agent that can automate a range of tasks — from statistical analysis to visualization and data wrangling. Reads messy files, plans steps, writes and runs code, and verifies itself, reaching state of the art on tough multi file tasks. It lifts accuracy to 45.2% on DABStep, 44.7% on KramaBench, and 38.5% on DA-Code, and holds first place on DABStep as of September-25. Earlier agents lean on clean CSVs and struggle when answers are split across JSON, markdown, and free text. DS-STAR begins by scanning a directory and producing a plain language summary of each file’s structure and contents that becomes shared context. A Planner proposes steps, a Coder writes Python, a Verifier checks sufficiency, a Router fixes mistakes or adds steps, and the loop stops when it passes or reaches 10 rounds. This setup handles heterogeneous data because the summaries surface schema, types, keys, and hints, so plans refer to real fields instead of guessing. On benchmarks the gains are steady, moving from 41.0% to 45.2% on DABStep, 39.8% to 44.7% on KramaBench, and 37.0% to 38.5% on DA-Code. Ablations explain the lift, removing the Data File Analyzer drops hard task accuracy on DABStep to 26.98%, and removing the Router also hurts across easy and hard tasks. Refinement depth matches difficulty, hard tasks average 5.6 rounds, easy tasks average 3.0 rounds, and over 50% of easy tasks finish in 1 round. The framework generalizes across base models, with a GPT-5 version doing better on easy items and a Gemini-2.5-Pro version doing better on hard items. Net effect, DS-STAR reduces the gap between messy data and reliable answers across CSV, XLSX, JSON, markdown, and plain text.
Rohan Paul tweet media
English
15
134
937
65.5K
Linyi Yang
Linyi Yang@linyi_yang·
Thanks for sharing our work!
Akshay 🚀@akshay_pachaar

Fine-tuning LLM Agents without Fine-tuning LLMs! Imagine improving your AI agent's performance from experience without ever touching the model weights. It's just like how humans remember past episodes and learn from them. That's precisely what Memento does. The core concept: Instead of updating LLM weights, Memento learns from experiences using memory. It reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. Think of it as giving your agent a notebook to remember what worked and what didn't! How does it work? The system breaks down into two key components: 1️⃣ Case-Based Reasoning (CBR) at work: Decomposes complex tasks into sub-tasks and retrieves relevant past experiences. No gradients needed, just smart memory retrieval! 2️⃣ Executor Executes each subtask using MCP tools and records outcomes in memory for future reference. Through MCP, the executor can accomplish most real-world tasks & has access to the following tools: 🔍 Web research 📄 Document handling 🐍 Safe Python execution 📊 Data analysis 🎥 Media processing I found this to be a really good path toward building human-like agents. 👉 Over to you, what are your thoughts? I have shared the relevant links in next tweet! _____ Share this with your network if you found this insightful ♻️ Find me → @akshay_pachaar for more insights and tutorials on AI and Machine Learning!

English
0
0
3
324
Linyi Yang retweetledi
Matthias Samwald
Matthias Samwald@matthiassamwald·
English
1
1
4
286
elvis
elvis@omarsar0·
Fine-tuning LLM Agents without Fine-tuning LLMs Catchy title and very cool memory technique to improve deep research agents. Great for continuous, real-time learning without gradient updates. Here are my notes:
elvis tweet media
English
52
211
1.2K
139.5K
Linyi Yang retweetledi
Dr Singularity
Dr Singularity@Dr_Singularity·
Cool paper. How Far Are AI Scientists from Changing the World?
Dr Singularity tweet media
English
9
31
162
9.2K
WestlakeNLP
WestlakeNLP@NlpWestlake·
WE WILL HAVE A SPECIAL GUEST AT ACL'S POSTER SESSION THIS MORNING!! Come to see our presenter Yue Zhang at Hall 4/5 from 11:00 p.m. Details are shown below:
WestlakeNLP tweet media
English
2
0
7
629
Linyi Yang
Linyi Yang@linyi_yang·
We are in 312, Hall X4@ACL 2025 poster session. Prof. Yue Zhang will present our work DeepReview by himself! Welcome to come and discuss!!
English
0
0
2
231
Linyi Yang retweetledi
Carlos E. Perez
Carlos E. Perez@IntuitMachine·
A curated collection of papers exploring the path towards Deep Research Agents.
Carlos E. Perez tweet media
English
3
55
308
28.1K
Linyi Yang retweetledi
ACL 2026
ACL 2026@aclmeeting·
Panel alert! 🎉 Join us at #ACL2025NLP for a discussion with NLP experts Mirella Lapata, Dan Roth, and Yue Zhang, and our moderator Eduard Hovy! More info coming soon! 👀🔥 👉 2025.aclweb.org/program/panel/
English
1
5
36
3.4K
Linyi Yang retweetledi
CLS
CLS@ChengleiSi·
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
CLS tweet media
English
12
163
635
152.1K
Linyi Yang retweetledi
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Many PhDs (my past self included) fall into the trap of thinking that publishing in top-tier conferences is the ultimate goal. But publishing ≠ impact. Muon was just a blog post. It got Keller into OpenAI, he might be training GPT-5 with it now. I'm grateful he listed me as 2nd author. I just ran NanoGPT experiments to test Muon’s scalability on larger LLMs, and it crushed AdamW (the old king of optimizers)! Lesson: Optimize for impact, not prestige. In research, and in life.
Keller Jordan@kellerjordan0

The reason I didn't write a proper arxiv paper for Muon is because I simply don't think there's any relationship between the ability to publish a paper with lots of good-looking results about a new optimizer, and whether that optimizer actually works. I only trust speedruns.

English
37
97
1.4K
232.9K