DAPLab

33 posts

DAPLab

DAPLab

@DAP__Lab

Future of Data, Agents and Processes (DAP). DAPLab is a group of @Columbia faculty and their PhD students at the forefront of applied research of Agentic AI.

New York, NY 参加日 Temmuz 2025
20 フォロー中123 フォロワー
DAPLab
DAPLab@DAP__Lab·
[📢 Upcoming AI Entrepreneurship Series Talk] Title: Running AI is Harder Than Training It: The Engineering Behind Inference Speaker: Sidharth Shanker Location: Davis Auditorium Date/Time: Thursday, April 2, 2026, 11:30 AM ET Bio: Sidharth Shanker leads the Core Product Engineering team at Baseten, where he focuses on building a robust and scalable platform for deploying and serving machine learning models in production. With over a decade of experience in software engineering, he has worked across a range of industries, including e-commerce, genomics, and social media, developing systems that power real-world applications at scale. At Baseten, Sidharth is particularly interested in the challenges of inference infrastructure—ensuring models are served securely, reliably, and efficiently to end users. His work sits at the intersection of machine learning systems and developer experience, with an emphasis on making advanced AI capabilities accessible in production environments. Abstract: In this talk, Sidharth will explore why deploying AI systems in real-world applications presents challenges distinct from model training. He will discuss the engineering complexities behind inference, including serving models securely, reliably, and at scale, and provide insight into the hidden systems work behind a single request to a large language model.
DAPLab tweet media
English
1
1
3
109
DAPLab
DAPLab@DAP__Lab·
Submissions are now open on OpenReview for North East AI Agents Day! Deadline is April 1st! #tab-your-consoles" target="_blank" rel="nofollow noopener">openreview.net/group?id=NE_Ag…
DAPLab@DAP__Lab

🤖 Calling all academic researchers in Agents! We are excited to announce North East AI Agents Day, a one-day workshop bringing together communities in ML, Systems, and HCI! 📅 May 8th 📍 New York 💡 Submit your extended abstract (DDL: Apr 1st)! More: ne-agents-day.github.io

English
0
2
2
630
DAPLab
DAPLab@DAP__Lab·
📢[New AI Entrepreneurship Series Talks] Title: AI Attacks Speaker: Dr. Neil Daswani Location: Davis Auditorium Date/Time: Thursday, March 19, 2026, 11:30 AM ET Bio: Dr. Neil Daswani is a CISO-in-Residence at Firebolt Ventures and Co-Academic Director of Stanford’s Advanced Cybersecurity Program. After completing his PhD at Stanford University and leading security initiatives at Google, he co-founded Dasient, a cybersecurity company funded by Google Ventures and later acquired by Twitter/X. After his time at Twitter, he served as CISO of several public companies including LifeLock, Symantec’s Consumer Business Unit, and QuantumScape. Today, he advises multiple venture capital funds and focuses on both securing artificial intelligence and applying AI to cybersecurity. Dr. Daswani has co-authored two books, Big Breaches: Cybersecurity Lessons for Everyone and Foundations of Security: What Every Programmer Needs to Know. He holds over a dozen patents, has published numerous technical articles, and earned his PhD and MS in Computer Science from Stanford and his BS in Computer Science with honors and distinction from Columbia University. Abstract: In this talk, Dr. Daswani will discuss the emerging era of non-human adversaries, where AI does not merely assist hackers but autonomously executes the majority of attack workflows. He will examine key developments in AI-driven cyber threats, from AI-orchestrated espionage campaigns to multimillion-dollar deepfake fraud incidents, and discuss what these developments mean for the future of cybersecurity and artificial intelligence.
DAPLab tweet media
English
0
2
1
171
DAPLab
DAPLab@DAP__Lab·
📢[Upcoming AI Entrepreneurship Series Talks] Guest Name: Ivan Burazin Title of the speech: Scaling RL Rollouts: Agent-Native Infrastructure with Daytona Bio of the guests: Ivan Burazin is the co-founder and CEO of Daytona, one of the fastest-growing infrastructure companies of its generation. Daytona is building agent-native cloud infrastructure that enables AI agents to securely run, fork, and manage stateful runtime environments at scale. Backed by $31M, including a $24M Series A led by FirstMark Capital, Daytona powers millions of sandboxes per day for startups and Fortune 500 companies building autonomous AI systems. Previously, Ivan co-founded Codeanywhere, one of the first cloud IDEs (2009), and created Shift, Europe’s leading developer conference, acquired by Infobip in 2021. He later joined Infobip’s executive board as Chief Developer Experience Officer. Abstract of the talk: In this talk, we’ll outline why a new class of agent-native infrastructure is emerging, what problems it is designed to solve, and the core use cases driving it, from autonomous coding agents to large-scale evaluation and training workloads. Daytona is an agent-native control plane designed to orchestrate isolated, stateful sandbox environments at scale. We’ll break down the infrastructure challenges behind isolation, state management, and massive parallelism, and why traditional VM and container stacks fall short. As a concrete example, we’ll walk through scaling RL rollouts, showing how tens of thousands of environments can be provisioned and orchestrated in minutes as part of a high-throughput RL pipeline. Location Davis Auditorium, 530 W 120th St, New York, NY 10027, USA Talk time Date: March 5, 2026 Time: 11:30 - 1:00 PM
DAPLab tweet media
English
0
2
0
693
DAPLab
DAPLab@DAP__Lab·
📢[AI Entrepreneurship Series Talks] Title A Talk about STLabs (TBD) Speaker Amit Agarwal Location Davis Auditorium Date/Time Thursday, February 19, 2026, 11:40 AM ET Bio Amit Agarwal is the founder of Standard Template Labs (STLabs), where he is building a new platform in enterprise software. Before STLabs, Amit spent a year as a General Partner at ICONIQ Capital, investing in and advising technology companies. He also serves on the Board of Directors at Datadog, where he previously spent 13 years as an executive, joining as employee number eight. At Datadog, Amit helped build the company from its earliest days through its growth into a public company. He built and led teams across product, marketing, sales, corporate development, and operations — having started in the early years doing hands-on product management, go-to-market, and customer-facing work.
DAPLab tweet media
English
0
2
1
2K
DAPLab
DAPLab@DAP__Lab·
🤖 Calling all academic researchers in Agents! We are excited to announce North East AI Agents Day, a one-day workshop bringing together communities in ML, Systems, and HCI! 📅 May 8th 📍 New York 💡 Submit your extended abstract (DDL: Apr 1st)! More: ne-agents-day.github.io
English
0
12
35
15.3K
DAPLab
DAPLab@DAP__Lab·
📡Columbia Engineering AI Entrepreneurship Series Title: A Talk about Parallel.AI (TBD) Speaker: Parag Agrawal Location: Davis Auditorium Date/Time: Thursday, February 5, 2026, 11:00 AM ET Bio: Parag Agrawal is the founder of Parallel Web Systems, a company unlocking the web for AI agents. Previously, he spent 11 years at Twitter, where he joined as an engineer before serving as CTO, and then CEO. Parag has a PhD from Stanford University in Computer Science and a Bachelor’s degree in Computer Science and Engineering from IIT, Bombay.
DAPLab tweet media
English
1
4
7
2.9K
DAPLab がリツイート
Billy Xuanming Zhang
Billy Xuanming Zhang@XuanmingZhang07·
LLMs can “think longer” and get better answers… but what if you can’t afford long reasoning? In our new paper, we study how LLMs reason under fixed computation budgets, where producing useful partial solutions quickly matters more than exhaustive reasoning. 🧵(1/n) 🔗: arxiv.org/pdf/2601.11038
Billy Xuanming Zhang tweet media
English
1
11
25
4.8K
DAPLab がリツイート
Jenny Ma
Jenny Ma@jenny_ma_·
i've isolated four recurring agent behaviors behind most vibe-coding failures: 1. skipping steps 2. ignoring conventions and style 3. making wrong assumptions 4. local optimization check out my new blog post for more details! daplab.cs.columbia.edu/general/2026/0…
English
1
3
7
329
DAPLab
DAPLab@DAP__Lab·
[New Blog on Vibe Coding!] Vibe Coding needs Policy Enforcement Vibe-coding is both amazing and infuriating. If I want to spin up a brand-new app from scratch? Holy shit, it’s magic. It’s fast, it’s fluid, it feels like collaborating with an engineer who’s always in a good mood. But the moment I ask it to do something more risky, tricky, or unspecified—where my particular taste and coding style matter (like adding a decently complex feature to a codebase I care about), I’m suddenly fighting with it. Vibe-coding devolves into vibe-debugging, vibe-backtracking, vibe-arguing. I isolated four recurring agent behaviors behind most vibe-coding failures: 1) Skipping Steps - The agent confidently says it will do something (“I’ll build the backend and the frontend!”) and then only builds half, forgetting entire chunks of functionality. 2) Ignoring Conventions and Style - Even with clear patterns in my codebase and explicit rules (ie: keep my imports at the top of the file), AI still goes rogue. It adds docstrings when I never use them, rearranges file structures, overengineers components. 3) Making Wrong Assumptions - Because it’s so eager to help, the agent commits to the first interpretation it forms. It builds whole flows and architectures around assumptions I would’ve corrected if it had asked one more question. 4) Local Optimization (Hacking Instead of Engineering) - Agents love the quickest apparent fix. For example, when writing code for a Rubik’s cube app, it might try to hardcode cube states instead of writing a real solver.Check out the full blog post to see how existing solutions can still fail to fix these issues, and how we should approach this instead (hint – vibe coding needs policy enforcement)! See more details here: daplab.cs.columbia.edu/general/2026/0…
DAPLab tweet media
English
0
2
9
2.6K
DAPLab
DAPLab@DAP__Lab·
9 Critical Failure Patterns of Coding Agents Vibe coding feels like magic. Until you try to ship a real feature. We spent last semester using and evaluating the top AI agents (Claude, Cline, Cursor, Replit) by building 15+ real-world applications. We collected hundreds of failures and found 9 recurring patterns that repeat across every single tool. The reality is that agents often prioritize runnable code over correct code. They suppress errors to keep the app "alive," even if the underlying logic is broken. We documented the critical failure patterns, including: 1) Business Logic Mismatch: Why agents struggle with basic rules (like applying a discount to a shopping cart total). 2) Presentation & UI Grounding Mismatch: Why layouts break because agents can't see. 3) Exception & Error Handling: How agents suppress errors just to make code run. Read more about it here! daplab.cs.columbia.edu/general/2026/0…
English
0
2
6
348
DAPLab
DAPLab@DAP__Lab·
Why Vibe Coding Fails and How to Fix It Everyone is talking about how AI agents will 10x developer productivity. But anyone who has actually built a real app with Cursor, Cline, or Replit knows the reality: The first draft looks amazing. But as soon as you try to iterate? The application starts breaking. This is the struggle of Vibe Debugging. It starts out looking great. But then you encounter silent errors and buggy logic. You realize the AI doesn't actually understand what you are building, and you are stuck trying to fix a black box. At Columbia DAPLab, we are investigating exactly why this happens. We have written a blog series on the reality of Vibe Debugging and how to close the gap between demo and production. Read our first part here! daplab.cs.columbia.edu/general/2026/0…
English
1
4
7
625
DAPLab
DAPLab@DAP__Lab·
🎉 Excited to share that our project has been newly funded by Microsoft Research! Towards Robust Generalization in Agentic AI via Environment Scaling explores how agentic systems can generalize more reliably by systematically scaling and diversifying their environments. Grateful for the support! Looking forward to pushing this direction forward! 🚀 🔗 microsoft.com/en-us/research…
English
0
3
4
4.6K
DAPLab
DAPLab@DAP__Lab·
🚀 Excited to share that DAP Lab has 7 papers accepted at #NeurIPS2025 — covering multi-agent reasoning, LLM caching, persona risks, system tuning via LLM agents, simulation-first agent training, and RL theory 👇 🔍Check them out if you are at #NeurIPS2025! We’d love feedback, discussions, and potential collaborations. Paper list here: • Multi-agent Markov Entanglement (Shuze Chen, Tianyi Peng) — Spotlight + winner of INFORMS JFIG & 2nd place in George Nicholson Student Paper Competition 🏆 • Tail-Optimized Caching for LLM Inference (Wenxin Zhang, Yueying Li, Ciamac C. Moallemi, Tianyi Peng) — improving LLM inference efficiency 👏 • LLM Generated Persona Is a Promise With a Catch (Ang Li, Haozhe Chen, Hongseok Namkoong, Tianyi Peng) — a position paper reflecting on strengths & caveats of LLM-derived personas 👩‍👩‍👦‍👦 • LLM Agents for Always-On Operating System Tuning (Georgios Liargkovas, Vahab Jabrayilov, Hubertus Franke, Kostis Kaffes) — leveraging LLMs for live OS tuning, showing better performance than classical ML tuning.🔧 • RAISE: Reliable Agent Improvement via Simulated Experience (Sahar Omidi Shayegan, Joshua Meyer, Victor Shih, Sebastian Sosa, Tianyi Peng, Kostis Kaffes, Eugene Wu, Andi Partovi, Mehdi Jamei) — simulation-first AI-agent training framework 🔄. • Q-learning with Posterior Sampling (Priyank Agrawal, Shipra Agrawal, Azmat Azati) — a new RL algorithm achieving near-optimal theory guarantees in tabular episodic MDPs 🎯 • Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper (Xinyue Zhu*, Binghao Huang*, Yunzhu Li) — a scalable multimodal data collection system that empowers physical agents (i.e., robots) to interact with the world. 🤖 #MachineLearning #AI #LLM #Systems #MultiAgent #NeurIPS
English
1
6
12
2.8K
DAPLab
DAPLab@DAP__Lab·
4. Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving by @nikospagonas00, @yeounoh, @kkaffes, @arvind_uw In collaboration with Google, we introduce Cortex, a prototype workflow-aware serving platform designed for agentic workloads. The core principle of Cortex is stage isolation: it provisions dedicated resource pools for each distinct stage of an agentic workflow. This simple yet powerful strategy mitigates inter-stage interference in compute and memory, leading to better KV cache utilization, higher throughput, and more predictable performance. By customizing resource allocation and scheduling within each distinct stage of agentic workflows, Cortex lays the groundwork for more advanced, agent-native serving paradigms.
English
0
1
1
317
DAPLab
DAPLab@DAP__Lab·
3. Toward Systems Foundations for Agentic Exploration by Alex Xu, Tianle Zhou, @sirrice, @kkaffes Digital environments are messy and stateful: every agent action perturbs hidden processes, files, and I/O, making single-shot execution brittle. Achieving high accuracy therefore requires exploration—trying alternatives, backtracking, and reusing partial progress. But exploration is only practical if agents can reliably roll environments forward and back, which in turn depends on fast, faithful state/restore primitives. Our results point to a simple conclusion: scaling agent exploration beyond benchmarks and into real deployments will require native, fork-like primitives that let agents branch, isolate, and rejoin execution cheaply and consistently.
English
1
1
1
307
DAPLab
DAPLab@DAP__Lab·
This week, we are presenting a slate of new research at SOSP workshops, spanning agentic infrastructure and self-tuning kernels. Our work lays the foundation for a future agentic infrastructure that will enable the safe, reliable and efficient operation of LLM agents in real-world environments. Highlights below and all papers available on our website.
English
1
3
6
1.3K