Zhou Yu

734 posts

Zhou Yu

@Zhou_Yu_AI

Founder of https://t.co/9KM4uFSKBQ, Associate Professor at Columbia. Making ai agent design and deployment easy, safe, and fast! Forbes 30 under 30.

New York, USA Katılım Temmuz 2015

1.3K Takip Edilen12.7K Takipçiler

Sabitlenmiş Tweet

Zhou Yu@Zhou_Yu_AI·12 Şub

Check this out! We are hosting a workshop for agent researchers to meet and get to know each other. If you have anything related to agents, either in ML, System, or HCI perspective, submit to the workshop and meet the rest of the community.

DAPLab@DAP__Lab

🤖 Calling all academic researchers in Agents! We are excited to announce North East AI Agents Day, a one-day workshop bringing together communities in ML, Systems, and HCI! 📅 May 8th 📍 New York 💡 Submit your extended abstract (DDL: Apr 1st)! More: ne-agents-day.github.io

English

4.9K

Zhou Yu@Zhou_Yu_AI·6d

It is difficult to detect multi-turn agent errors. See an example below of how an engineer was able to overcome that using our ArkSim tool!

Arklex AI@ArklexAI

Passed 100% of manual single-turn tests until they ran multi-turn eval with ArkSim. An exam admin agent had one rule: no academic questions. ArkSim simulated a user who rephrased the same question mid-conversation. The guardrail broke in 4 turns. That test case didn't exist until ArkSim wrote it. Full breakdown: arklex.ai/blog/arksim-gu… Test your agent with ArkSim: github.com/arklexai/arksim #AIAgents #ArkSim #LLMOps

English

1.2K

Zhou Yu@Zhou_Yu_AI·19 Mar

Our Open-source simulation tool, which is used to test and evaluate AI agents, has more integrations with various agent frameworks. If you want us ot conenct other framework, drop a line.

Arklex AI@ArklexAI

ArkSim plugs directly into the AI agent framework you're already using. - OpenAI Agents SDK - Claude Agent SDK - Google ADK - LangChain - LangGraph - LlamaIndex - CrewAI - AutoGen - Pydantic AI - Mastra - Vercel AI SDK - Smol Agents - Rasa And more ... One testing layer. No migration, no lock-in. github.com/arklexai/arksim #AIAgents #OpenSource #DeveloperTools #LLM #AIEvaluation

English

Zhou Yu retweetledi

Arklex AI@ArklexAI·19 Mar

English

1.5K

Zhou Yu@Zhou_Yu_AI·17 Mar

ArkSim automatically quantifies whether your PR has regression. Try our open source to improve your agent's reliability!

Arklex AI@ArklexAI

Stop testing AI agents manually. ArkSim v0.3.0 adds CI integration, so every PR gets a quality gate automatically. Try it: github.com/arklexai/arksim #AIAgents #LLMOps

English

818

Zhou Yu@Zhou_Yu_AI·13 Mar

Test your OpenClaw agent before deployment using Arksim Open Source!

Arklex AI@ArklexAI

OpenClaw agents are everywhere right now, but many break once deployed. Use our open-source Arksim tool to automatically thoroughly test your OpenClaw agents before shipping and catch failures early. Avoid painful production surprises. Repo:github.com/arklexai/arksi… #OpenClaw #AIEngineer #AIEval

English

1.2K

Zhou Yu@Zhou_Yu_AI·10 Mar

A good agent testing tool has to be compatible with different agent frameworks, as different frameworks may have different pros and cons for different applications. Try our open-source simulation-based testing tool, Arksim!

Arklex AI@ArklexAI

Agent frameworks are exploding. Evaluation tooling isn’t. ArkSim v0.2.0 makes agent evaluation framework-agnostic. New integration examples for: OpenAI Agents SDK Claude Agent SDK Google ADK LangChain / LangGraph CrewAI LlamaIndex Build agents with any stack. Evaluate with one framework. Repo: github.com/arklexai/arksim #AIAgents #LLM #OpenSource

English

931

Zhou Yu@Zhou_Yu_AI·5 Mar

Try our open-source Arksim tool to find agents' bugs before your users.

Arklex AI@ArklexAI

Shipping AI agents is easy 🤖 Testing them is harder 🧪 Your agent looks fine… until a user: • changes their mind mid-conversation • asks something off-script • hits a weird edge case 🐛 Most teams write manual test cases. They take forever and never cover enough. So we built ArkSim. It generates synthetic users, runs multi-turn conversations, and finds failures before production users do. Install in seconds: pip install arksim 🚀 github.com/arklexai/arksim #AI #AIAgents #OpenSource #AIEngineering

English

1.5K

Zhou Yu@Zhou_Yu_AI·5 Mar

We’ve been building AI agents long enough to learn one thing: Testing them is harder than building them. Once an agent is live, real users behave in ways you never anticipated. They change their minds mid-conversation. They ask off-script questions. They hit edge cases you’d never think to test. Manual test cases don’t capture this. So we built ArkSim. It simulates realistic users, runs diverse interactions against your agent, and surfaces failures before your users do. ⛵ ArkSim is now open source. pip install arksim Repo: github.com/arklexai/arksim

English

194

14.2K

Zhou Yu@Zhou_Yu_AI·15 Şub

Come to meet Amitl previous president of Datadog at Columbia next Thursday!

DAPLab@DAP__Lab

📢[AI Entrepreneurship Series Talks] Title A Talk about STLabs (TBD) Speaker Amit Agarwal Location Davis Auditorium Date/Time Thursday, February 19, 2026, 11:40 AM ET Bio Amit Agarwal is the founder of Standard Template Labs (STLabs), where he is building a new platform in enterprise software. Before STLabs, Amit spent a year as a General Partner at ICONIQ Capital, investing in and advising technology companies. He also serves on the Board of Directors at Datadog, where he previously spent 13 years as an executive, joining as employee number eight. At Datadog, Amit helped build the company from its earliest days through its growth into a public company. He built and led teams across product, marketing, sales, corporate development, and operations — having started in the early years doing hands-on product management, go-to-market, and customer-facing work.

English

1.3K

Zhou Yu@Zhou_Yu_AI·3 Şub

Join us at Columbia to listen to previous Twitter CEO on his new startup

DAPLab@DAP__Lab

📡Columbia Engineering AI Entrepreneurship Series Title: A Talk about Parallel.AI (TBD) Speaker: Parag Agrawal Location: Davis Auditorium Date/Time: Thursday, February 5, 2026, 11:00 AM ET Bio: Parag Agrawal is the founder of Parallel Web Systems, a company unlocking the web for AI agents. Previously, he spent 11 years at Twitter, where he joined as an engineer before serving as CTO, and then CEO. Parag has a PhD from Stanford University in Computer Science and a Bachelor’s degree in Computer Science and Engineering from IIT, Bombay.

English

Zhou Yu@Zhou_Yu_AI·28 Oca

Check out our paper on how to make agents better reasoners given a budget.

Billy Xuanming Zhang@XuanmingZhang07

LLMs can “think longer” and get better answers… but what if you can’t afford long reasoning? In our new paper, we study how LLMs reason under fixed computation budgets, where producing useful partial solutions quickly matters more than exhaustive reasoning. 🧵(1/n) 🔗: arxiv.org/pdf/2601.11038

English

2.5K

Zhou Yu retweetledi

Billy Xuanming Zhang@XuanmingZhang07·27 Oca

English

4.8K

Zhou Yu@Zhou_Yu_AI·23 Oca

Vibe coding is popular, but we see more common bugs in these agents. Have you encountered similar bugs?

DAPLab@DAP__Lab

[New Blog on Vibe Coding!] Vibe Coding needs Policy Enforcement Vibe-coding is both amazing and infuriating. If I want to spin up a brand-new app from scratch? Holy shit, it’s magic. It’s fast, it’s fluid, it feels like collaborating with an engineer who’s always in a good mood. But the moment I ask it to do something more risky, tricky, or unspecified—where my particular taste and coding style matter (like adding a decently complex feature to a codebase I care about), I’m suddenly fighting with it. Vibe-coding devolves into vibe-debugging, vibe-backtracking, vibe-arguing. I isolated four recurring agent behaviors behind most vibe-coding failures: 1) Skipping Steps - The agent confidently says it will do something (“I’ll build the backend and the frontend!”) and then only builds half, forgetting entire chunks of functionality. 2) Ignoring Conventions and Style - Even with clear patterns in my codebase and explicit rules (ie: keep my imports at the top of the file), AI still goes rogue. It adds docstrings when I never use them, rearranges file structures, overengineers components. 3) Making Wrong Assumptions - Because it’s so eager to help, the agent commits to the first interpretation it forms. It builds whole flows and architectures around assumptions I would’ve corrected if it had asked one more question. 4) Local Optimization (Hacking Instead of Engineering) - Agents love the quickest apparent fix. For example, when writing code for a Rubik’s cube app, it might try to hardcode cube states instead of writing a real solver.Check out the full blog post to see how existing solutions can still fail to fix these issues, and how we should approach this instead (hint – vibe coding needs policy enforcement)! See more details here: daplab.cs.columbia.edu/general/2026/0…

English

Zhou Yu@Zhou_Yu_AI·16 Oca

Excited to work on this project, we aim to solve the robustness problem of the agent environment. Soon, you can use all these popular environments by calling a container in Azure.

DAPLab@DAP__Lab

🎉 Excited to share that our project has been newly funded by Microsoft Research! Towards Robust Generalization in Agentic AI via Environment Scaling explores how agentic systems can generalize more reliably by systematically scaling and diversifying their environments. Grateful for the support! Looking forward to pushing this direction forward! 🚀 🔗 microsoft.com/en-us/research…

English

4.1K

Zhou Yu retweetledi

DAPLab@DAP__Lab·3 Ara

🚀 Excited to share that DAP Lab has 7 papers accepted at #NeurIPS2025 — covering multi-agent reasoning, LLM caching, persona risks, system tuning via LLM agents, simulation-first agent training, and RL theory 👇 🔍Check them out if you are at #NeurIPS2025! We’d love feedback, discussions, and potential collaborations. Paper list here: • Multi-agent Markov Entanglement (Shuze Chen, Tianyi Peng) — Spotlight + winner of INFORMS JFIG & 2nd place in George Nicholson Student Paper Competition 🏆 • Tail-Optimized Caching for LLM Inference (Wenxin Zhang, Yueying Li, Ciamac C. Moallemi, Tianyi Peng) — improving LLM inference efficiency 👏 • LLM Generated Persona Is a Promise With a Catch (Ang Li, Haozhe Chen, Hongseok Namkoong, Tianyi Peng) — a position paper reflecting on strengths & caveats of LLM-derived personas 👩‍👩‍👦‍👦 • LLM Agents for Always-On Operating System Tuning (Georgios Liargkovas, Vahab Jabrayilov, Hubertus Franke, Kostis Kaffes) — leveraging LLMs for live OS tuning, showing better performance than classical ML tuning.🔧 • RAISE: Reliable Agent Improvement via Simulated Experience (Sahar Omidi Shayegan, Joshua Meyer, Victor Shih, Sebastian Sosa, Tianyi Peng, Kostis Kaffes, Eugene Wu, Andi Partovi, Mehdi Jamei) — simulation-first AI-agent training framework 🔄. • Q-learning with Posterior Sampling (Priyank Agrawal, Shipra Agrawal, Azmat Azati) — a new RL algorithm achieving near-optimal theory guarantees in tabular episodic MDPs 🎯 • Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper (Xinyue Zhu*, Binghao Huang*, Yunzhu Li) — a scalable multimodal data collection system that empowers physical agents (i.e., robots) to interact with the world. 🤖 #MachineLearning #AI #LLM #Systems #MultiAgent #NeurIPS

English

2.8K

Zhou Yu@Zhou_Yu_AI·19 Kas

If you want to discover the secret behind the top-tier open-source LLM Qwen, join Columbia's AI lecture series this Friday! Sign up here: columbiaengineering.campusgroups.com/StrategicEvent…

English

2.1K

Zhou Yu retweetledi

Xiao Yu@xy2437·15 Eki

Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔 We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this! 🧵[1/n]

English

2.6K

Zhou Yu@Zhou_Yu_AI·29 Eki

How could a diffusion model reason? Check out our new paper

Zachary Horvitz@zachary_horvitz

✨Masked Diffusion Language Models✨ are great for reasoning, but not just for the reasons you think! Fast parallel decoding? 🤔 Any-order decoding? 🤨 Plot twist: MDLMs offer A LOT MORE for inference and post-training! 🎢🧵

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry