Zhou Yu

734 posts

Zhou Yu banner
Zhou Yu

Zhou Yu

@Zhou_Yu_AI

Founder of https://t.co/9KM4uFSKBQ, Associate Professor at Columbia. Making ai agent design and deployment easy, safe, and fast! Forbes 30 under 30.

New York, USA Katılım Temmuz 2015
1.3K Takip Edilen12.7K Takipçiler
Sabitlenmiş Tweet
Zhou Yu
Zhou Yu@Zhou_Yu_AI·
Check this out! We are hosting a workshop for agent researchers to meet and get to know each other. If you have anything related to agents, either in ML, System, or HCI perspective, submit to the workshop and meet the rest of the community.
DAPLab@DAP__Lab

🤖 Calling all academic researchers in Agents! We are excited to announce North East AI Agents Day, a one-day workshop bringing together communities in ML, Systems, and HCI! 📅 May 8th 📍 New York 💡 Submit your extended abstract (DDL: Apr 1st)! More: ne-agents-day.github.io

English
0
1
13
4.9K
Zhou Yu retweetledi
Arklex AI
Arklex AI@ArklexAI·
ArkSim plugs directly into the AI agent framework you're already using. - OpenAI Agents SDK - Claude Agent SDK - Google ADK - LangChain - LangGraph - LlamaIndex - CrewAI - AutoGen - Pydantic AI - Mastra - Vercel AI SDK - Smol Agents - Rasa And more ... One testing layer. No migration, no lock-in. github.com/arklexai/arksim #AIAgents #OpenSource #DeveloperTools #LLM #AIEvaluation
Arklex AI tweet media
English
0
1
2
1.5K
Zhou Yu
Zhou Yu@Zhou_Yu_AI·
A good agent testing tool has to be compatible with different agent frameworks, as different frameworks may have different pros and cons for different applications. Try our open-source simulation-based testing tool, Arksim!
Arklex AI@ArklexAI

Agent frameworks are exploding. Evaluation tooling isn’t. ArkSim v0.2.0 makes agent evaluation framework-agnostic. New integration examples for: OpenAI Agents SDK Claude Agent SDK Google ADK LangChain / LangGraph CrewAI LlamaIndex Build agents with any stack. Evaluate with one framework. Repo: github.com/arklexai/arksim #AIAgents #LLM #OpenSource

English
0
0
2
931
Zhou Yu
Zhou Yu@Zhou_Yu_AI·
We’ve been building AI agents long enough to learn one thing: Testing them is harder than building them. Once an agent is live, real users behave in ways you never anticipated. They change their minds mid-conversation. They ask off-script questions. They hit edge cases you’d never think to test. Manual test cases don’t capture this. So we built ArkSim. It simulates realistic users, runs diverse interactions against your agent, and surfaces failures before your users do. ⛵ ArkSim is now open source. pip install arksim Repo: github.com/arklexai/arksim
English
6
15
194
14.2K
Zhou Yu
Zhou Yu@Zhou_Yu_AI·
Join us at Columbia to listen to previous Twitter CEO on his new startup
DAPLab@DAP__Lab

📡Columbia Engineering AI Entrepreneurship Series Title: A Talk about Parallel.AI (TBD) Speaker: Parag Agrawal Location: Davis Auditorium Date/Time: Thursday, February 5, 2026, 11:00 AM ET Bio: Parag Agrawal is the founder of Parallel Web Systems, a company unlocking the web for AI agents. Previously, he spent 11 years at Twitter, where he joined as an engineer before serving as CTO, and then CEO. Parag has a PhD from Stanford University in Computer Science and a Bachelor’s degree in Computer Science and Engineering from IIT, Bombay.

English
0
1
9
2K
Zhou Yu retweetledi
Billy Xuanming Zhang
Billy Xuanming Zhang@XuanmingZhang07·
LLMs can “think longer” and get better answers… but what if you can’t afford long reasoning? In our new paper, we study how LLMs reason under fixed computation budgets, where producing useful partial solutions quickly matters more than exhaustive reasoning. 🧵(1/n) 🔗: arxiv.org/pdf/2601.11038
Billy Xuanming Zhang tweet media
English
1
11
25
4.8K
Zhou Yu
Zhou Yu@Zhou_Yu_AI·
Vibe coding is popular, but we see more common bugs in these agents. Have you encountered similar bugs?
DAPLab@DAP__Lab

[New Blog on Vibe Coding!] Vibe Coding needs Policy Enforcement Vibe-coding is both amazing and infuriating. If I want to spin up a brand-new app from scratch? Holy shit, it’s magic. It’s fast, it’s fluid, it feels like collaborating with an engineer who’s always in a good mood. But the moment I ask it to do something more risky, tricky, or unspecified—where my particular taste and coding style matter (like adding a decently complex feature to a codebase I care about), I’m suddenly fighting with it. Vibe-coding devolves into vibe-debugging, vibe-backtracking, vibe-arguing. I isolated four recurring agent behaviors behind most vibe-coding failures: 1) Skipping Steps - The agent confidently says it will do something (“I’ll build the backend and the frontend!”) and then only builds half, forgetting entire chunks of functionality. 2) Ignoring Conventions and Style - Even with clear patterns in my codebase and explicit rules (ie: keep my imports at the top of the file), AI still goes rogue. It adds docstrings when I never use them, rearranges file structures, overengineers components. 3) Making Wrong Assumptions - Because it’s so eager to help, the agent commits to the first interpretation it forms. It builds whole flows and architectures around assumptions I would’ve corrected if it had asked one more question. 4) Local Optimization (Hacking Instead of Engineering) - Agents love the quickest apparent fix. For example, when writing code for a Rubik’s cube app, it might try to hardcode cube states instead of writing a real solver.Check out the full blog post to see how existing solutions can still fail to fix these issues, and how we should approach this instead (hint – vibe coding needs policy enforcement)! See more details here: daplab.cs.columbia.edu/general/2026/0…

English
0
0
4
2K
Zhou Yu retweetledi
DAPLab
DAPLab@DAP__Lab·
🚀 Excited to share that DAP Lab has 7 papers accepted at #NeurIPS2025 — covering multi-agent reasoning, LLM caching, persona risks, system tuning via LLM agents, simulation-first agent training, and RL theory 👇 🔍Check them out if you are at #NeurIPS2025! We’d love feedback, discussions, and potential collaborations. Paper list here: • Multi-agent Markov Entanglement (Shuze Chen, Tianyi Peng) — Spotlight + winner of INFORMS JFIG & 2nd place in George Nicholson Student Paper Competition 🏆 • Tail-Optimized Caching for LLM Inference (Wenxin Zhang, Yueying Li, Ciamac C. Moallemi, Tianyi Peng) — improving LLM inference efficiency 👏 • LLM Generated Persona Is a Promise With a Catch (Ang Li, Haozhe Chen, Hongseok Namkoong, Tianyi Peng) — a position paper reflecting on strengths & caveats of LLM-derived personas 👩‍👩‍👦‍👦 • LLM Agents for Always-On Operating System Tuning (Georgios Liargkovas, Vahab Jabrayilov, Hubertus Franke, Kostis Kaffes) — leveraging LLMs for live OS tuning, showing better performance than classical ML tuning.🔧 • RAISE: Reliable Agent Improvement via Simulated Experience (Sahar Omidi Shayegan, Joshua Meyer, Victor Shih, Sebastian Sosa, Tianyi Peng, Kostis Kaffes, Eugene Wu, Andi Partovi, Mehdi Jamei) — simulation-first AI-agent training framework 🔄. • Q-learning with Posterior Sampling (Priyank Agrawal, Shipra Agrawal, Azmat Azati) — a new RL algorithm achieving near-optimal theory guarantees in tabular episodic MDPs 🎯 • Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper (Xinyue Zhu*, Binghao Huang*, Yunzhu Li) — a scalable multimodal data collection system that empowers physical agents (i.e., robots) to interact with the world. 🤖 #MachineLearning #AI #LLM #Systems #MultiAgent #NeurIPS
English
1
6
12
2.8K
Zhou Yu retweetledi
Xiao Yu
Xiao Yu@xy2437·
Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔 We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this! 🧵[1/n]
Xiao Yu tweet media
English
1
4
10
2.6K