Patrick (Pengcheng) Jiang

57 posts

Patrick (Pengcheng) Jiang

@patpcj

CS PhD @ UIUC; prev: SR @GoogleResearch; research: Agentic AI, Knowledge Indexing, Retrieval; recent work: DeepRetrieval, s3; open to chat

Katılım Ekim 2024

478 Takip Edilen201 Takipçiler

Patrick (Pengcheng) Jiang retweetledi

Alex Prompter@alex_prompter·20 Ara

This paper from Stanford and Harvard explains why most “agentic AI” systems feel impressive in demos and then completely fall apart in real use. The core argument is simple and uncomfortable: agents don’t fail because they lack intelligence. They fail because they don’t adapt. The research shows that most agents are built to execute plans, not revise them. They assume the world stays stable. Tools work as expected. Goals remain valid. Once any of that changes, the agent keeps going anyway, confidently making the wrong move over and over. The authors draw a clear line between execution and adaptation. Execution is following a plan. Adaptation is noticing the plan is wrong and changing behavior mid-flight. Most agents today only do the first. A few key insights stood out. Adaptation is not fine-tuning. These agents are not retrained. They adapt by monitoring outcomes, recognizing failure patterns, and updating strategies while the task is still running. Rigid tool use is a hidden failure mode. Agents that treat tools as fixed options get stuck. Agents that can re-rank, abandon, or switch tools based on feedback perform far better. Memory beats raw reasoning. Agents that store short, structured lessons from past successes and failures outperform agents that rely on longer chains of reasoning. Remembering what worked matters more than thinking harder. The takeaway is blunt. Scaling agentic AI is not about larger models or more complex prompts. It’s about systems that can detect when reality diverges from their assumptions and respond intelligently instead of pushing forward blindly. Most “autonomous agents” today don’t adapt. They execute. And execution without adaptation is just automation with better marketing.

English

204

649

3.5K

Patrick (Pengcheng) Jiang@patpcj·13 Ara

@dair_ai Thanks for sharing our work ;)

English

269

DAIR.AI@dair_ai·13 Ara

First comprehensive framework for how AI agents actually improve through adaptation. While there is a lot of hype about building bigger models, the research reveals a different lever: systematic adaptation of agents and their tools. Researchers from many universities surveyed the rapidly expanding landscape of agentic AI adaptation. What they found: a fragmented field with no unified understanding of how agents learn to use tools, when to adapt the agent versus the tool, and which strategies work for which scenarios. These are all important for building production-ready AI agents. Adaptation in agentic AI follows four distinct paradigms that most practitioners conflate or ignore entirely. The framework organizes all adaptation strategies into two dimensions. > Agent Adaptation (A1, A2): modifying the agent's parameters, representations, or policies. > Tool Adaptation (T1, T2): optimizing external components like retrievers, planners, and memory modules while keeping the agent frozen. Let's discuss each in more detail: A1: Tool Execution Signaled Agent Adaptation. The agent learns from verifiable outcomes produced by tools it invokes. This involves code sandbox results, retrieval relevance scores, and API call outcomes. Methods like Toolformer, ToolLLM, and DeepRetrieval also fall here. The signal comes from whether the tool execution succeeded, not whether the final answer was correct. A2: Agent Output Signaled Agent Adaptation. The agent optimizes based on evaluations of its own final outputs. This includes both tool-free reasoning (DeepSeek-R1, Kimi-1.5) and tool-augmented adaptation (ReTool, Search-R1). The signal comes from answer correctness or preference scores, not intermediate tool calls. T1: Agent-Agnostic Tool Adaptation. This involves tools trained independently of any specific agent, including HuggingGPT, ViperGPT, and classic ML tools that serve as plug-and-play modules. These tools generalize well across different agents but may not be optimized for any particular one. T2: Agent-Supervised Tool Adaptation. Tools adapted using signals from a frozen agent's outputs. Includes reward-driven retriever tuning, adaptive search subagents, and memory-update modules like Reflexion and Memento. The agent stays fixed while tools learn to better support its reasoning. The trade-offs between paradigms are explicit. Cost and flexibility: A1/A2 require substantial compute for training billion-parameter models but offer maximal flexibility. T1/T2 optimize external components at a lower cost but may hit ceilings set by the frozen agent's capabilities. Generalization patterns differ significantly. T1 tools trained on broad distributions generalize well across agents and tasks. A1 methods risk overfitting to specific environments unless carefully regularized. T2 approaches enable independent tool upgrades without agent retraining, facilitating continuous improvement. The researchers identify when each paradigm fits. A1 suits scenarios with verifiable tool outputs like code execution or database queries. A2 works when only the final answer quality matters. T1 applies when tools must serve multiple agents. T2 excels when the agent is fixed, but tool performance is the bottleneck. State-of-the-art systems increasingly combine paradigms. A deep research system might use T1-style pretrained retrievers, T2-style adaptive search agents trained via frozen LLM feedback, and A1-style reasoning agents fine-tuned with execution feedback in a cascaded architecture. Four open challenges remain unsolved: - Co-adaptation: jointly optimizing agents and tools remains underexplored. - Continual adaptation: enabling lifelong learning without catastrophic forgetting. - Safe adaptation: preventing harmful behaviors during optimization. - Efficient adaptation: reducing computational costs while maintaining performance. The choice of adaptation paradigm fundamentally shapes what an agentic system can learn, how fast it improves, and whether improvements transfer across tasks. Teams building production agents need a principled framework for these decisions, not ad-hoc choices. Report: github.com/pat-jj/Awesome… Learn to build effective AI agents in our academy: dair-ai.thinkific.com/courses/buildi…

English

446

32.8K

Patrick (Pengcheng) Jiang retweetledi

Rohan Paul@rohanpaul_ai·11 Ara

A solid 65-page long paper from Stanford, Princeton, Harvard, University of Washington, and many other top univ. Says that almost all advanced AI agent systems can be understood as using just 4 basic ways to adapt, either by updating the agent itself or by updating its tools. It also positions itself as the first full taxonomy for agentic AI adaptation. Agentic AI means a large model that can call tools, use memory, and act over multiple steps. Adaptation here means changing either the agent or its tools using a kind of feedback signal. In A1, the agent is updated from tool results, like whether code ran correctly or a query found the answer. In A2, the agent is updated from evaluations of its outputs, for example human ratings or automatic checks of answers and plans. In T1, retrievers that fetch documents or domain models for specific fields are trained separately while a frozen agent just orchestrates them. In T2, the agent stays fixed but its tools are tuned from agent signals, like which search results or memory updates improve success. The survey maps many recent systems into these 4 patterns and explains trade offs between training cost, flexibility, generalization, and modular upgrades.

English

230

1.2K

70.8K

Patrick (Pengcheng) Jiang retweetledi

Jiacheng Lin@jclin808·8 Ara

When I walked around the NeurIPS 2025 poster and oral sessions, it was clear that LLMs, RL, and Agents are still the dominant keywords. New papers appear every single day — and for anyone entering this area, diving straight into all of them can feel overwhelming. But after digging deeper, we realized that most existing agent papers actually share common patterns. In fact, they can be grouped into four adaptation paradigms. We summarize these paradigms and the landscape in our survey. Huge thanks to all the amazing co-authors!

Yu Zhang@yuz9yuz

Excited to share our survey "Adaptation of Agentic AI" (led by @patpcj & @jclin808) unifying the rapidly growing landscape of agent adaptation (tool-execution- vs. agent-output-signaled) & tool adaptation (agent-agnostic vs. agent-supervised)! Preprint: github.com/pat-jj/Awesome…

English

327

46.9K

Patrick (Pengcheng) Jiang@patpcj·8 Ara

Thanks to Yu for boosting our survey!!! We cleanly break down the different types of adaptation of agentic AI, chart their evolution, and discuss strengths, limitations, and exciting opportunities ahead. Don’t miss it! ;)

Yu Zhang@yuz9yuz

English

1.4K

Patrick (Pengcheng) Jiang retweetledi

Sagnik@saagnikkk·30 Kas

🚨New Blog Alert: Is AdamW an overkill for RLVR? We found that vanilla SGD is 1. As performant as AdamW, 2. 36x more parameter efficient naturally. (much more than a rank 1 lora) 🤯 Looks like a "free lunch". Maybe It’s time to rethink the optimizers for RLVR 🧵

English

478

171.8K

Patrick (Pengcheng) Jiang@patpcj·15 Kas

@peter_richtarik This paper can definitely be used to improve LLMs if such reviews are generated by them😂

English

178

Peter Richtarik@peter_richtarik·13 Kas

I am an AC for ICLR 2026. One of the papers in my batch was just withdrawn. The authors wrote a brief response, explaining why the reviewers failed at their job. I agree with most of their comments. The authors gave up. They are fed up. Just like many of us. I understand. We pretend the emperor has clothes, but he is naked. Here is the final part of their withdrawal notice. I took the liberty to make it public, to highlight that what we are doing with AI conference reviews these last few years is, basically, madness. --- Comment: We thank the reviewers for their time. However, upon reading the reviews for our paper, it became immediately apparent that the four "reject" ratings are not based on good-faith academic disagreement, but on a critical failure to read the submitted paper. The reviews are rife with demonstrably false claims that are directly contradicted by the text. The core justifications for rejection rely on asserting that key components are "missing" when they are explicitly detailed in the manuscript. Some specific examples are (and many are even fake claims). Claim: Harder tasks like GSM8K are missing. Fact: GSM8K results are in many tables, like Table 2 (Section 4.2) and Appendix G. Claim: The method does not use per-layer ranks. Fact: This is the entire point of our method. The reviewer clearly mistook our method for the baselines. (Section 2, Table 1). Claim: The GP kernel is not specified. Fact: It is specified in Appendix E (Table 6). Claim: There is no ablation of the method's three stages. Fact: Section 4.4 ("Ablation Study") and Appendix J are dedicated to this. Reviewers have a fundamental responsibility to read and evaluate the work they are assigned. The nature of these errors is so fundamental, so systemic in overlooking explicit content, that it goes far beyond what "limited time" or "oversight" can explain. This work has gone through several rounds of revision over the last year. In earlier submissions, the paper usually received borderline or weak-accept scores. Numerous signs strongly suggest that some reviewers are relying entirely on AI tools to automatically generate peer reviews, rather than fulfilling their fundamental responsibility of personally reading and evaluating manuscripts. We strongly protest this. This is a gross disrespect to the authors. It is a flagrant desecration of the reviewer's sacred duty. It fundamentally undermines the integrity of the entire peer-review process. Given that the reviews are not based on the actual content of our paper, we have decided to withdraw the submission. We leave this comment so that future readers of the OpenReview page are aware that the items described as "missing" are already present in the submitted manuscript. These negative reviews for this submission are factually unsound and do not reflect the content of the paper. We cannot and will not accept an assessment that is not based on the work we actually submitted.

English

205

1.5K

149.3K

Patrick (Pengcheng) Jiang@patpcj·22 Eki

Thanks! A good reference is our DeepRetrieval paper (arxiv.org/pdf/2503.00223). Our results on literature search (Figure 1a and Figure 4) show that, when training over real search engines, retrieval rewards depend on live search outcomes, and the reward distribution observed during training shifts as the policy updates, which makes on-policy RL essential for stable optimization.

English

126

Ahmad Beirami@abeirami·22 Eki

@patpcj Interesting. Is there a good reference for this that you can please share?

English

431

Ahmad Beirami@abeirami·22 Eki

The debate over “Is RL needed?” often misses the point. On-policy RL is one tool for fitting the reward-tilted posterior π⋆(y∣x) ∝ p(y∣x) · exp(r(x,y)∕β), where p is the base generative model and r is a reward. This distills the capability signaled by the reward back into the generator. RL isn’t required: you can sample, approximate, or distill the same target via best-of-N, beam search, rejection sampling, MCMC, or DPO. They are all equivalent theoretically. The real question is generalization and efficiency: - generalization: how well does a chosen method transfer to unseen prompts? For training-based methods, as with any distillation problem, performance depends on the dataset and the loss you optimize. - efficiency: what is the cost of sampling from the desired distribution? The cost is on-par with sampling from the base model for most training solutions while inference-time solutions generally pay a larger decoding cost. A fair comparison across methods entails looking rigorously at the "performance" vs "inference cost" (ignoring training cost). Considering only performance, best-of-N is all you need!

English

219

26.2K

Patrick (Pengcheng) Jiang@patpcj·22 Eki

@Xinyu2ML You attached their arxiv link😂

English

Xinyu Yang@Xinyu2ML·22 Eki

Happy to see the effectiveness of sparse FT in balancing new information and old knowledge. We have proposed S2FT (arxiv.org/abs/2510.15103) with a similar motivation one year ago, and I believe the introduction of memory layer leads to better continual learning!

Jessy Lin@realJessyLin

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full finetuning and LoRA see drastic drops in held-out task performance (📉-89% FT, -71% LoRA on fact learning tasks), memory layers learn the same amount with far less forgetting (-11%). 🧵:

English

Patrick (Pengcheng) Jiang retweetledi

Jessy Lin@realJessyLin·21 Eki

English

293

1.9K

317.3K

Patrick (Pengcheng) Jiang@patpcj·19 Eki

This is awesome, on-policy RL is definitely the way to do retrieval! For anyone who is seeking open-source code for such training, please check our DeepRetrieval: github.com/pat-jj/DeepRet…, which is the first search agent trained with on-policy RL, optimizing for the retrieval outcome. To train a search agent for a frozen generator (like Sonnet in the post), please check our s3: github.com/pat-jj/s3 for more details.

Cognition@cognition

Introducing SWE-grep and SWE-grep-mini: Cognition’s model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent – or try it in our new playground!

English

1.4K

Patrick (Pengcheng) Jiang@patpcj·19 Eki

This is awesome, on-policy RL is definitely the way to do retrieval. For anyone who is seeking open-source code for this, please check our DeepRetrieval: github.com/pat-jj/DeepRet…, which is the first search agent trained with on-policy RL, optimizing for the retrieval outcome. To train a search agent for a frozen generator (like Sonnet in the post), please check our s3: github.com/pat-jj/s3 for more details.

English

Cognition@cognition·16 Eki

English

130

1.2K

661.4K

Patrick (Pengcheng) Jiang@patpcj·8 Eki

🚀 Our DeepRetrieval — the first search agent trained with on-policy RL — will be presented at #COLM2025 tomorrow! Unluckily, @jclin808 and I couldn’t make it to Canada due to visa issues, but our friend will be presenting on our behalf. 🕚 Time: 11 AM Thursday (Poster Session #5) 📍 Poster #51 Please stop by and take a look! If you have any questions our presenter can’t answer, feel free to reply here or email me at pj20@illinois.edu. P.S. I’m now deploying this work @Google 🚀 Code: github.com/pat-jj/DeepRet… #COLM_conf #COLM2025 #DeepResearch #AgenticAI

English

1.4K

Patrick (Pengcheng) Jiang retweetledi

Nan Jiang@nanjiang_cs·30 Eyl

My 3rd blogpost on PG, the topic I am least familiar with but get asked a lot, so I thought I'd just put together the very limited stuff I know on this topic. Somehow the post gets cynical from time to time🙃 nanjiang.cs.illinois.edu/2025/09/29/pg.…

English

143

19.7K

Patrick (Pengcheng) Jiang retweetledi

Andrew Ng@AndrewYNg·30 Eyl

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

English

563

3.7K

297.9K

Patrick (Pengcheng) Jiang retweetledi

Thinking Machines@thinkymachines·29 Eyl

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

557

3.5K

1.4M

Patrick (Pengcheng) Jiang@patpcj·26 May

(15/15) TL;DR 🔍 s3 trains a real search capability, not answer token alignment 📈 Very Strong performance with 70× less data 🤝 Works out-of-the-box with black-box LLMs 📄Paper: arxiv.org/pdf/2505.14146 💻 Code: github.com/pat-jj/s3

English

266

Patrick (Pengcheng) Jiang@patpcj·26 May

📢(1/15) Introducing s3 — a search-only RL framework for RAG. 🔗arxiv.org/abs/2505.14146 Unlike prior agentic RAG methods, s3 optimizes search alone with a novel reward signal: Gain Beyond RAG (GBR). s3 beats state-of-the-art baselines using just 2.4k training samples. #emnlpmeeting #NeurIPSConf #AgenticAI

English

857

Patrick (Pengcheng) Jiang@patpcj·26 May

(13/15) Plug-and-play design s3 supports frozen LLMs (Qwen, Claude, etc.)—making it compatible with proprietary models, efficient to deploy, and easy to scale. (14/15) One framework, many benefits 🚀 Efficient 🔌 Modular 🧠 Smart search 📉 Low data Perfect for both research and real-world deployment.

English

209

Patrick (Pengcheng) Jiang@patpcj·26 May

(12/15) Compact & effective s3 converges fast, without unnecessary expansion.

English

178

Keşfet

@dair_ai @peter_richtarik @Xinyu2ML @AIatMeta @jclin808 @Google @elonmusk @BarackObama