Willy Douhard

104 posts

Willy Douhard

Willy Douhard

@willy_douhard

Building @trysummon_com

San Francisco Katılım Eylül 2016
92 Takip Edilen291 Takipçiler
Willy Douhard retweetledi
Dan Constantini
Dan Constantini@danoandco·
Twill.ai update: new runtime, new features! We rebuilt the runtime that drives every Twill task and open-sourced it as agentbox 👉 github.com/TwillAI/agentb… New: - Live preview pane: see your app run, open a terminal, watch entrypoint logs - Reasoning level control (per task) - Token-level streaming of agent responses - Message editing (rewind a run from any prior message) Faster: - Sandbox setup ~5x faster - Follow-up response start ~4x faster More info here: twill.ai/newsletter/202…
English
1
3
6
183
Willy Douhard retweetledi
Dan Constantini
Dan Constantini@danoandco·
New mode in Twill: Claude codes, Codex reviews. Until they converge. Ralph loop is a new opt-in mode for complex tasks where you want more rigor than a single agent pass. You select it and set a budget when you create the task. 𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀 1. You set a budget and describe the task 2. A criteria agent explores your repo and proposes acceptance criteria 3. You review, refine, and approve 4. Claude implements against the criteria 5. Codex verifies the result against the criteria. Pass or fail. 6. On fail, the feedback goes back in. Claude continues to work. 7. Loop runs until criteria pass or the budget runs out. Two things make this different from a normal agent run. 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗰𝗿𝗶𝘁𝗲𝗿𝗶𝗮 𝗯𝗲𝗳𝗼𝗿𝗲 𝗰𝗼𝗱𝗲. Separating "what does done look like" from "write the code" forces the ambiguity to surface upfront, in a document you can read and edit, instead of mid-implementation when it's expensive. 𝗖𝗿𝗼𝘀𝘀-𝗺𝗼𝗱𝗲𝗹 𝘃𝗲𝗿𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. The verifier gets the criteria and the repo state. It has no memory of the implementation decisions, no attachment to the approach. It reads the code cold, the same way a reviewer sees a PR for the first time. A model checking its own output tends to confirm what it intended, not what it produced. A different model doesn't have that bias. The name comes from @GeoffreyHuntley's Ralph loop pattern, a bash one-liner that runs a coding agent in a tight loop with full context resets. Same iterative philosophy. Different mechanism: structured criteria up front, cross-model verification at each pass.
Dan Constantini tweet mediaDan Constantini tweet media
English
0
4
6
295
Willy Douhard retweetledi
Dan Constantini
Dan Constantini@danoandco·
We've been open-sourcing some of the reusable agent skills we built inside @twill_ai computer-use-cli: give any coding cli the ability to see and control a linux gui. claude code just shipped computer use natively, but this is useful for codex, opencode, and anything else that runs bash video-recording: turn a verified browser flow into a demo video that actually looks human. cursor movement, pacing, trimming dead time github.com/TwillAI/skills
English
0
4
5
355
Willy Douhard retweetledi
Dan Constantini
Dan Constantini@danoandco·
A cloud coding agent saying "done" is not proof. You need proof of completion. How do you increase trust in your cloud agent system? → Review subagents in the agent inner loop that inspect the work independently → Live previews so you can test the result yourself → Screenshots that show what changed → Screen recordings that show the flow working end to end. We use WebReel from @vercel for this @twill_ai now supports all four proof types!
English
0
3
4
245
Willy Douhard retweetledi
Dan Constantini
Dan Constantini@danoandco·
Cloud agents are powerful: task parallelisation, self-verification, long-running. But they're missing a UX that feels local. So we built the @twill_ai CLI The Twill CLI lets you create and manage remote coding agent sessions from your terminal while @AnthropicAI Claude Code, @opencode , or @OpenAI Codex runs the work inside persistent cloud sandboxes. - Create and manage tasks from the CLI - Switch between agent modes like plan, code, and ask - Keep the familiar terminal workflow while the runtime lives in the cloud - Run longer-lived and parallel work without tying it to your laptop Try it at twill.ai
English
1
4
7
857
Willy Douhard retweetledi
Dan Constantini
Dan Constantini@danoandco·
Can AI agents actually navigate your codebase? @OpenAI just shared a rubric for evaluating exactly that. The insight is that agent performance isn't just about model quality. Repo structure, docs, and verification paths all affect whether an agent works quickly and reliably. Their rubric includes: Bootstrap self-sufficiency Task entry points Validation harness Lint and format gates Agent repo map Structured docs Decision records Once you start delegating real engineering work to agents, these qualities matter a lot. We built this scorecard into Twill. Paste a public GitHub repo or connect your private one, get a live score, supporting evidence, and concrete next steps. Try it: twill.ai/scorecard
English
1
2
5
303
Willy Douhard retweetledi
dex
dex@dexhorthy·
4 people have texted me in the last hour "is claude being extra dumb right now?" don't know what you're talking about this looks great
dex tweet media
English
2
1
13
1.1K
Willy Douhard retweetledi
Twill
Twill@twill_ai·
🔥 Chainlit now supports @AnthropicAI's Model Control Protocol (MCP)! Check out our demo with @Stripe's MCP. Compatible with popular frameworks like @langchain. Link to the code at the end of the thread 👇
English
5
6
35
3K
Harrison Chase
Harrison Chase@hwchase17·
This is good take - lots of servers, need more clients Will attempt to build a client today. Is mostly FE so will be vibe coding with Claude 🙃 Initial Qs: - still no remote MCP servers? - can I just support tools? are prompts/resources really needed? If in SF - come join!
Harrison Chase tweet media
English
41
6
195
28.7K
Willy Douhard retweetledi
Twill
Twill@twill_ai·
🚀 Exciting demo with @humanlayer_dev and @OpenAI 🚀 Build a next-gen customer support agent that handles queries but loops in a human for critical actions—ensuring precision with a human touch!
English
3
9
30
4K
Willy Douhard retweetledi
Twill
Twill@twill_ai·
🚀 New @LiteLLM Integration 🚀 Track & log all your LiteLLM calls with @literalai in just 2 lines of code! LiteLLM allows you to interact with 100+ LLMs (@OpenAI, @AnthropicAI, @MistralAI, etc.) seamlessly using a consistent OpenAI-compatible format, either use their python SDK or their proxy server. We also added an example using the LiteLLM proxy within a @chainlit_io app 🎉
English
1
6
18
2.5K
Willy Douhard
Willy Douhard@willy_douhard·
🎙️Chainlit Realtime is here! 🎙️ Featuring first-class WebSocket support for realtime audio interactions in Chainlit applications. We’ve added support for @OpenAI real-time API to unlock a whole new UX for devs building intelligent, responsive assistants.
English
5
14
61
31.5K
Willy Douhard
Willy Douhard@willy_douhard·
👏Bonus👏: This is available for Chainlit copilot and custom frontend as well! As usual, all interactions, including audio files, can be logged in @literalai.
English
1
0
2
632
Willy Douhard retweetledi
Dan Constantini
Dan Constantini@danoandco·
🎉 Introducing @literalai Prompt and LLM A/B Testing 📈 Gradually roll out new prompts or LLMs in production and compare performance metrics, reducing risk 🚀 Product teams can independently deploy prompt or LLM updates, speeding up iteration and freeing engineering resources
English
2
4
12
1.2K