Linyi Yang

117 posts

Linyi Yang

@linyi_yang

SUSTech

Shenzhen Katılım Haziran 2016

576 Takip Edilen495 Takipçiler

Linyi Yang retweetledi

Haitham Bou Ammar@hbouammar·3d

What a ridiculous decision from #NeurIPS! Some of the companies affected have sponsored you for years! Blows my mind! #AI #MachineLearning

English

206

101.1K

Linyi Yang retweetledi

Xidong Feng@Xidong_Feng·16 Mar

We've witnessed a crazy concurrent line of work on on-policy self-distillation in LLMs, and I truly believe this is the next paradigm of RL. Back in 2024, we proposed this exact conceptual shift in our paper, Natural Language Reinforcement Learning (NLRL). The real breakthrough here isn't just the specific distillation mechanics. It’s that RL is fundamentally shifting away from the traditional "sample -> then filter or amplify" approach. Instead of passively waiting to stumble upon a good action to upweight, the field is moving toward true synthetic language data generation from experience, which enables true continual learning. You can see this exact recipe playing out across all the recent hit papers: • RLTF (2602.02482): Text critiques as privileged info • OPSD (2601.18734): Ground-truth solutions • SDPO (2601.20802): Runtime errors & execution feedback • ERL(2602.13949): Self-reflections & demonstrations Instead of just using a scalar reward to filter bad rollouts, they all use language feedback to explicitly generate a corrected, high-quality trajectory in hindsight, and then distill that competence back into the base policy. While the specific ways we adapt RL to LLMs are still rapidly evolving, the core vision we outlined in NLRL holds true today: a single scalar is simply too poor of a carrier for credit assignment. When people talk about "experiential memory" for agents today, they are essentially describing what we framed as a Language Value Function (LVF)—not just RAG over past episodes, but storing the structured, strategy-level "why" behind what worked. And what we called "Language Policy Improvement" is exactly this feedback-aware self-distillation loop we see everywhere now. Language, not scalars, is the future of RL. 📄 Check out our early exploration of this framework here: arxiv.org/abs/2411.14251

English

205

31.6K

Linyi Yang retweetledi

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

738

13.6K

6.5M

Linyi Yang retweetledi

DAIR.AI@dair_ai·13 Oca

On building more powerful self-evolving agents. LLM agents struggle to learn from experience after deployment. Fine-tuning is expensive and causes catastrophic forgetting. RAG retrieves based on semantic similarity alone, often pulling noise instead of what actually works. Similarity and utility are not the same thing. This new research introduces MemRL, a framework that enables agents to self-evolve through non-parametric reinforcement learning on episodic memory, keeping the LLM completely frozen. The core idea is to treat memory retrieval as a decision-making problem, not a matching problem. Each memory stores an Intent-Experience-Utility triplet. The utility is a learned Q-value representing expected returns, continuously refined through environmental feedback. MemRL implements Two-Phase Retrieval. First, filter candidates by semantic similarity to ensure relevance. Then, rank by learned Q-values to select what actually works. This distinguishes high-value strategies from semantically similar noise. When the agent succeeds or fails, it updates the Q-values of retrieved memories using Bellman-style backups. No gradient updates to model weights. The frozen LLM provides stable reasoning while the memory evolves plastically. Results across four benchmarks: On HLE (knowledge frontier tasks), MemRL significantly outperforms both RAG and existing memory systems like MemP. The pattern holds on BigCodeBench for code generation, ALFWorld for exploration tasks, and Lifelong Agent Bench for OS and database operations. Analysis confirms a strong correlation between learned utility scores and actual task success, validating that Q-values capture genuine functional value rather than superficial similarity. Why does it matter? Decoupling stable reasoning from plastic memory enables continuous runtime improvement without the catastrophic forgetting or computational costs of fine-tuning. Paper: arxiv.org/abs/2601.03192 Learn to build effective AI agents in our academy: dair-ai.thinkific.com

English

281

27.4K

Linyi Yang retweetledi

Rohan Paul@rohanpaul_ai·8 Kas

🧮 Google just published DS-STAR A data science agent that can automate a range of tasks — from statistical analysis to visualization and data wrangling. Reads messy files, plans steps, writes and runs code, and verifies itself, reaching state of the art on tough multi file tasks. It lifts accuracy to 45.2% on DABStep, 44.7% on KramaBench, and 38.5% on DA-Code, and holds first place on DABStep as of September-25. Earlier agents lean on clean CSVs and struggle when answers are split across JSON, markdown, and free text. DS-STAR begins by scanning a directory and producing a plain language summary of each file’s structure and contents that becomes shared context. A Planner proposes steps, a Coder writes Python, a Verifier checks sufficiency, a Router fixes mistakes or adds steps, and the loop stops when it passes or reaches 10 rounds. This setup handles heterogeneous data because the summaries surface schema, types, keys, and hints, so plans refer to real fields instead of guessing. On benchmarks the gains are steady, moving from 41.0% to 45.2% on DABStep, 39.8% to 44.7% on KramaBench, and 37.0% to 38.5% on DA-Code. Ablations explain the lift, removing the Data File Analyzer drops hard task accuracy on DABStep to 26.98%, and removing the Router also hurts across easy and hard tasks. Refinement depth matches difficulty, hard tasks average 5.6 rounds, easy tasks average 3.0 rounds, and over 50% of easy tasks finish in 1 round. The framework generalizes across base models, with a GPT-5 version doing better on easy items and a Gemini-2.5-Pro version doing better on hard items. Net effect, DS-STAR reduces the gap between messy data and reliable answers across CSV, XLSX, JSON, markdown, and plain text.

English

134

937

65.5K

Linyi Yang@linyi_yang·8 Eki

Interesting findings!

MikaStars★@MikaStars39

Also 🤗huggingface daily paper huggingface.co/papers/2510.06…

English

384

Linyi Yang retweetledi

MikaStars★@MikaStars39·8 Eki

Why do reasoning models fail to refuse harmful requests? 🤔 We Mechanistically explains it! 🧠Check our new paper: Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning? 📑Paper: arxiv.org/abs/2510.06036 💻Code: github.com/MikaStars39/Re… #LLM #AISafety #Deepseek

GIF

English

3.5K

Linyi Yang@linyi_yang·8 Eyl

Thanks for sharing our work!

Akshay 🚀@akshay_pachaar

Fine-tuning LLM Agents without Fine-tuning LLMs! Imagine improving your AI agent's performance from experience without ever touching the model weights. It's just like how humans remember past episodes and learn from them. That's precisely what Memento does. The core concept: Instead of updating LLM weights, Memento learns from experiences using memory. It reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. Think of it as giving your agent a notebook to remember what worked and what didn't! How does it work? The system breaks down into two key components: 1️⃣ Case-Based Reasoning (CBR) at work: Decomposes complex tasks into sub-tasks and retrieves relevant past experiences. No gradients needed, just smart memory retrieval! 2️⃣ Executor Executes each subtask using MCP tools and records outcomes in memory for future reference. Through MCP, the executor can accomplish most real-world tasks & has access to the following tools: 🔍 Web research 📄 Document handling 🐍 Safe Python execution 📊 Data analysis 🎥 Media processing I found this to be a really good path toward building human-like agents. 👉 Over to you, what are your thoughts? I have shared the relevant links in next tweet! _____ Share this with your network if you found this insightful ♻️ Find me → @akshay_pachaar for more insights and tutorials on AI and Machine Learning!

English

324

Linyi Yang retweetledi

Matthias Samwald@matthiassamwald·28 Ağu

📢 Shout outs to the groups doing excellent work on biomedical research acceleration we covered in our exploratory analysis, including @FutureHouseSF @EmeraldCloudLab @daniil_boiko @vivnat @KyleWSwanson @james_y_zou @ProfBuehlerMIT @m_skarlinski @NJSzymanski @biogerontology @SRSchmidgall @ZijunLiu33 @largelymfs @drecmb @YuanhaoQ @MengdiWang10 @harriswangnyc @linyi_yang @briandavidearp @IGurevych @GreeneScientist (14/14)

English

286

Linyi Yang@linyi_yang·26 Ağu

Thanks for sharing our work. Really enjoyed working with these talented people and on the topic of Case-based Reasoning, which lies in Barry’s research scope. The full code can be found at: github.com/Agent-on-the-F…

elvis@omarsar0

Fine-tuning LLM Agents without Fine-tuning LLMs Catchy title and very cool memory technique to improve deep research agents. Great for continuous, real-time learning without gradient updates. Here are my notes:

English

986

Linyi Yang@linyi_yang·26 Ağu

@BerbaFan @omarsar0 Full code and document are provided

English

Nazmuzzaman Khan@BerbaFan·26 Ağu

@omarsar0 without code this is just an idea

English

686

elvis@omarsar0·25 Ağu

English

211

1.2K

139.5K

Linyi Yang retweetledi

Dr Singularity@Dr_Singularity·1 Ağu

Cool paper. How Far Are AI Scientists from Changing the World?

English

162

9.2K

Linyi Yang@linyi_yang·30 Tem

@NlpWestlake We are in Poster 312 :D

English

WestlakeNLP@NlpWestlake·30 Tem

WE WILL HAVE A SPECIAL GUEST AT ACL'S POSTER SESSION THIS MORNING!! Come to see our presenter Yue Zhang at Hall 4/5 from 11:00 p.m. Details are shown below:

English

629

Linyi Yang@linyi_yang·30 Tem

We are in 312, Hall X4@ACL 2025 poster session. Prof. Yue Zhang will present our work DeepReview by himself! Welcome to come and discuss!!

English

231

Linyi Yang retweetledi

Carlos E. Perez@IntuitMachine·29 Haz

A curated collection of papers exploring the path towards Deep Research Agents.

English

308

28.1K

Linyi Yang retweetledi

ACL 2026@aclmeeting·27 Haz

Panel alert! 🎉 Join us at #ACL2025NLP for a discussion with NLP experts Mirella Lapata, Dan Roth, and Yue Zhang, and our moderator Eduard Hovy! More info coming soon! 👀🔥 👉 2025.aclweb.org/program/panel/

English

3.4K

Linyi Yang@linyi_yang·6 Tem

😭

Hanchen Wang@hcwww_

Same here 🥲, we've invited ppl from Ai2, DeepMind, FutureHouse, Genentech, Harvard, Helmholtz, Microsoft, MILA, MIT, Meta, OpenAI, Princeton, Sakana, Stanford, xAI, to share experience on building AI Agents for Science. So curious which teams are more stellar and qualified

ART

497

Linyi Yang retweetledi

CLS@ChengleiSi·30 Haz

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

English

163

635

152.1K

Linyi Yang@linyi_yang·1 Tem

Data-centric strategy is still crucial for today's LLM development.

Alexandr Wang@alexandr_wang

I’m excited to be the Chief AI Officer of @Meta, working alongside @natfriedman, and thrilled to be accompanied by an incredible group of people joining on the same day. Towards superintelligence 🚀

English

262

Linyi Yang retweetledi

Yuchen Jin@Yuchenj_UW·15 Haz

Many PhDs (my past self included) fall into the trap of thinking that publishing in top-tier conferences is the ultimate goal. But publishing ≠ impact. Muon was just a blog post. It got Keller into OpenAI, he might be training GPT-5 with it now. I'm grateful he listed me as 2nd author. I just ran NanoGPT experiments to test Muon’s scalability on larger LLMs, and it crushed AdamW (the old king of optimizers)! Lesson: Optimize for impact, not prestige. In research, and in life.

Keller Jordan@kellerjordan0

The reason I didn't write a proper arxiv paper for Muon is because I simply don't think there's any relationship between the ability to publish a paper with lots of good-looking results about a new optimizer, and whether that optimizer actually works. I only trust speedruns.

English

1.4K

232.9K

Keşfet

@FutureHouseSF @EmeraldCloudLab @daniil_boiko @vivnat @KyleWSwanson @james_y_zou @ProfBuehlerMIT @m_skarlinski