

Yuqing Yang
39 posts




🏧Giving your agent unlimited tool calls doesn't make it smarter. 💡Why? It lacks 'Budget Awareness'! Introducing Budget Tracker, a simple plug-in that enables more effective scaling behaviors: higher performance, lower cost. Paper: arxiv.org/pdf/2511.17006



🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem? 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize highly dispersed information and generate long, structured outputs.







🤔Can LMs learn to skip steps to improve reasoning efficiency while maintaining accuracy? ✅The answer is Yes! In our #NeurIPS 2024 work, we show this behavior boosts efficiency, maintains accuracy, and even enhances generalization in OOD scenarios! 🚀arxiv.org/pdf/2411.01855 🧵⬇️

