Willy Douhard

104 posts

Willy Douhard

@willy_douhard

Building @trysummon_com

San Francisco Katılım Eylül 2016

92 Takip Edilen291 Takipçiler

Willy Douhard retweetledi

Dan Constantini@danoandco·12h

Twill.ai update: new runtime, new features! We rebuilt the runtime that drives every Twill task and open-sourced it as agentbox 👉 github.com/TwillAI/agentb… New: - Live preview pane: see your app run, open a terminal, watch entrypoint logs - Reasoning level control (per task) - Token-level streaming of agent responses - Message editing (rewind a run from any prior message) Faster: - Sandbox setup ~5x faster - Follow-up response start ~4x faster More info here: twill.ai/newsletter/202…

English

183

Willy Douhard retweetledi

Dan Constantini@danoandco·7 Nis

New mode in Twill: Claude codes, Codex reviews. Until they converge. Ralph loop is a new opt-in mode for complex tasks where you want more rigor than a single agent pass. You select it and set a budget when you create the task. 𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀 1. You set a budget and describe the task 2. A criteria agent explores your repo and proposes acceptance criteria 3. You review, refine, and approve 4. Claude implements against the criteria 5. Codex verifies the result against the criteria. Pass or fail. 6. On fail, the feedback goes back in. Claude continues to work. 7. Loop runs until criteria pass or the budget runs out. Two things make this different from a normal agent run. 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗰𝗿𝗶𝘁𝗲𝗿𝗶𝗮 𝗯𝗲𝗳𝗼𝗿𝗲 𝗰𝗼𝗱𝗲. Separating "what does done look like" from "write the code" forces the ambiguity to surface upfront, in a document you can read and edit, instead of mid-implementation when it's expensive. 𝗖𝗿𝗼𝘀𝘀-𝗺𝗼𝗱𝗲𝗹 𝘃𝗲𝗿𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. The verifier gets the criteria and the repo state. It has no memory of the implementation decisions, no attachment to the approach. It reads the code cold, the same way a reviewer sees a PR for the first time. A model checking its own output tends to confirm what it intended, not what it produced. A different model doesn't have that bias. The name comes from @GeoffreyHuntley's Ralph loop pattern, a bash one-liner that runs a coding agent in a tight loop with full context resets. Same iterative philosophy. Different mechanism: structured criteria up front, cross-model verification at each pass.

English

295

Willy Douhard retweetledi

Dan Constantini@danoandco·1 Nis

We've been open-sourcing some of the reusable agent skills we built inside @twill_ai computer-use-cli: give any coding cli the ability to see and control a linux gui. claude code just shipped computer use natively, but this is useful for codex, opencode, and anything else that runs bash video-recording: turn a verified browser flow into a demo video that actually looks human. cursor movement, pacing, trimming dead time github.com/TwillAI/skills

English

355

Willy Douhard retweetledi

Dan Constantini@danoandco·27 Mar

A cloud coding agent saying "done" is not proof. You need proof of completion. How do you increase trust in your cloud agent system? → Review subagents in the agent inner loop that inspect the work independently → Live previews so you can test the result yourself → Screenshots that show what changed → Screen recordings that show the flow working end to end. We use WebReel from @vercel for this @twill_ai now supports all four proof types!

English

245

Willy Douhard retweetledi

Dan Constantini@danoandco·19 Mar

Cloud agents are powerful: task parallelisation, self-verification, long-running. But they're missing a UX that feels local. So we built the @twill_ai CLI The Twill CLI lets you create and manage remote coding agent sessions from your terminal while @AnthropicAI Claude Code, @opencode , or @OpenAI Codex runs the work inside persistent cloud sandboxes. - Create and manage tasks from the CLI - Switch between agent modes like plan, code, and ask - Keep the familiar terminal workflow while the runtime lives in the cloud - Run longer-lived and parallel work without tying it to your laptop Try it at twill.ai

English

857

Willy Douhard retweetledi

Dan Constantini@danoandco·17 Mar

Can AI agents actually navigate your codebase? @OpenAI just shared a rubric for evaluating exactly that. The insight is that agent performance isn't just about model quality. Repo structure, docs, and verification paths all affect whether an agent works quickly and reliably. Their rubric includes: Bootstrap self-sufficiency Task entry points Validation harness Lint and format gates Agent repo map Structured docs Decision records Once you start delegating real engineering work to agents, these qualities matter a lot. We built this scorecard into Twill. Paste a public GitHub repo or connect your private one, get a live score, supporting evidence, and concrete next steps. Try it: twill.ai/scorecard

English

303

Willy Douhard retweetledi

Y Combinator@ycombinator·19 Eyl

Email, scheduling, rescheduling, time zones— @AprilAssistant can handle it all. Built by @Neha_Suresh_M and @vedhsaka in just 50 days, April is the AI voice agent making productivity real. Thousands of paying users already rely on it every day. forbes.com/sites/digital-…

English

110

23.2K

Willy Douhard retweetledi

dex@dexhorthy·22 Tem

4 people have texted me in the last hour "is claude being extra dumb right now?" don't know what you're talking about this looks great

English

1.1K

Willy Douhard@willy_douhard·12 Mar

@chainlit_io @AnthropicAI @stripe @langchain I am very excited about this release! It gives developers an open source client for MCPs on top of all Chainlit's features 🚀

English

Willy Douhard retweetledi

Twill@twill_ai·12 Mar

🔥 Chainlit now supports @AnthropicAI's Model Control Protocol (MCP)! Check out our demo with @Stripe's MCP. Compatible with popular frameworks like @langchain. Link to the code at the end of the thread 👇

English

Willy Douhard@willy_douhard·8 Mar

@hwchase17 I agree we need an open source client! I am currently working on adding MCP support to @chainlit_io. PR -> github.com/Chainlit/chain…. Here is a video of an example using the @stripe MCP.

English

191

Harrison Chase@hwchase17·8 Mar

This is good take - lots of servers, need more clients Will attempt to build a client today. Is mostly FE so will be vibe coding with Claude 🙃 Initial Qs: - still no remote MCP servers? - can I just support tools? are prompts/resources really needed? If in SF - come join!

English

195

28.7K

Willy Douhard retweetledi

Twill@twill_ai·18 Kas

🚀 Exciting demo with @humanlayer_dev and @OpenAI 🚀 Build a next-gen customer support agent that handles queries but loops in a human for critical actions—ensuring precision with a human touch!

English

Willy Douhard retweetledi

Twill@twill_ai·16 Eki

🚀 New @LiteLLM Integration 🚀 Track & log all your LiteLLM calls with @literalai in just 2 lines of code! LiteLLM allows you to interact with 100+ LLMs (@OpenAI, @AnthropicAI, @MistralAI, etc.) seamlessly using a consistent OpenAI-compatible format, either use their python SDK or their proxy server. We also added an example using the LiteLLM proxy within a @chainlit_io app 🎉

English

2.5K

Willy Douhard@willy_douhard·8 Eki

@Xavidop @OpenAI You have to use this pre-release 1.3.0rc1

English

159

Xavi Portilla@Xavidop·8 Eki

@willy_douhard @OpenAI Which version should we use to have this?

English

183

Willy Douhard@willy_douhard·4 Eki

🎙️Chainlit Realtime is here! 🎙️ Featuring first-class WebSocket support for realtime audio interactions in Chainlit applications. We’ve added support for @OpenAI real-time API to unlock a whole new UX for devs building intelligent, responsive assistants.

English

31.5K

Willy Douhard@willy_douhard·4 Eki

👀 Want to try it yourself? Head over to our GitHub and start building with Chainlit Realtime today! 🚀 Full example available here -> github.com/Chainlit/cookb… If you like it, drop a star ⭐️ #Chainlit #OpenAI #WebSockets #DeveloperTools #AI

English

568

Willy Douhard@willy_douhard·4 Eki

👏Bonus👏: This is available for Chainlit copilot and custom frontend as well! As usual, all interactions, including audio files, can be logged in @literalai.

English

632

Willy Douhard retweetledi

Dan Constantini@danoandco·4 Eyl

🎉 Introducing @literalai Prompt and LLM A/B Testing 📈 Gradually roll out new prompts or LLMs in production and compare performance metrics, reducing risk 🚀 Product teams can independently deploy prompt or LLM updates, speeding up iteration and freeing engineering resources

English

1.2K

Keşfet

@GeoffreyHuntley @twill_ai @vercel @AnthropicAI @opencode @OpenAI @AprilAssistant @Neha_Suresh_M