Bryan Helmig 🍻

1.2K posts

Bryan Helmig 🍻 banner
Bryan Helmig 🍻

Bryan Helmig 🍻

@bryanhelmig

@Zapier co-founder & CTO. Dad. Guitar picker.

St. Louis, MO Katılım Haziran 2009
2.1K Takip Edilen4K Takipçiler
Bryan Helmig 🍻
Bryan Helmig 🍻@bryanhelmig·
our next product is almost ready - agents that build and ship automations we've been using it internally for months and ready to open it up to a small group of beta testers before it even has a name.... we're looking for: - 10-100 person teams spending hours/week on manual ops - teams tired of chat, ready to build and ship automations through agents - teams that work in the open -- call recordings on, shared docs, public channels, etc - teams obsessed with optimizing their work apply: zapier.com/shared-brain
English
2
1
10
727
Bryan Helmig 🍻 retweetledi
Zapier
Zapier@zapier·
GPT-5.5 just hit 12.9% on our AutomationBench leaderboard First model to break 10% When context is missing, most models stop. GPT-5.5 keeps checking emails, docs, and chats until it knows what to do
Zapier tweet media
English
1
4
18
2.5K
Bryan Helmig 🍻 retweetledi
Wade Foster
Wade Foster@wadefoster·
We built an AI benchmark that measures real work. Today we're releasing it to everyone. AI evals tell you whether a model can do complex reasoning or generate code. Useful, but usually not the question our customers ask. They want to know: can this model find the right CRM record, send the right follow-up, and not break anything along the way? We went looking for a benchmark that tested that. Nobody had built one, so we did. @Zapier’s AutomationBench drops AI models into realistic business environments across six domains (Sales, Marketing, Ops, Support, Finance, HR) and checks whether the work actually got done. The tasks include live CRM data, inbox threads with ambiguous context, and multi-step tool chains where one wrong call cascades. Scoring is deterministic: either the right records were updated and the right messages were sent, or they weren't. It’s useful enough that we're releasing it publicly today. Open task set, open methodology, open leaderboard. Everyone should have access to this. No model has cracked 10%. Yet. Try it here: zapier.com/benchmarks
Wade Foster tweet media
English
15
22
131
35K
Bryan Helmig 🍻
Bryan Helmig 🍻@bryanhelmig·
i completely replaced mcp and now have 90+ custom cli scripts for jira, gdocs, lumapps, etc. built on top of the zapier sdk and cli (available here docs.zapier.com/sdk/quickstart) anytime i need another, i just ask to create. anytime i hit a bug, i just ask for a fix. its awesome. plus, with our managed connections, claude/codex even doesn't have the raw api keys or oauth credentials!
English
2
2
14
2.2K
Bryan Helmig 🍻 retweetledi
Wade Foster
Wade Foster@wadefoster·
Today we open the Zapier SDK to everyone. If you're building with AI agents, this is for you. I've been using this for 2 months. It's totally changed how I do my job. You install it in your coding agent. Cursor, Claude Code, Codex, whatever you use. Now that agent has access to 8,000+ apps through @Zapier and can do anything those APIs can do. I think it’s the most powerful thing we’ve launched in years. Now in open beta. Just give this link right to your agent: docs.zapier.com/sdk/quickstart
English
101
94
1K
173.8K
Bryan Helmig 🍻 retweetledi
Wade Foster
Wade Foster@wadefoster·
That 1 man $1b dollar company? Zapier in the stack in a big way. Brand new world out there! Long live small teams.
Matthew Gallagher@galligator

@Jordy_vD_ Zapier automates about 4 million tasks a month right now 💪

English
4
3
28
5.4K
Bryan Helmig 🍻 retweetledi
Wade Foster
Wade Foster@wadefoster·
Today we released our new AI Fluency Rubric. We use it for every hire, focusing on what they’ve actually built. Last May we open-sourced V1. Hundreds of companies used it to screen candidates and develop teams. It worked. But the floor moved fast. An updated look at the 3 levels of AI fluency at @Zapier: 1. Capable: "I use AI to operate at a meaningfully higher level." 2. Adoptive: "I orchestrate AI and build systems that elevate how I work." 3. Transformative: "I re-engineer how work happens." We evaluate theses across 4 dimensions: Mindset, Strategy, Building, and Accountability. We're sharing V2 publicly for the same reason we shared V1: every company needs a framework for this, and most don't have one yet. Don’t see your role? See all departments / learn more here: zpr.io/xQq5PHMDChrL
Wade Foster tweet media
English
36
104
1.3K
242.7K
Bryan Helmig 🍻 retweetledi
Mike Knoop
Mike Knoop@mikeknoop·
ARC-AGI-3 and ARC Prize 2026 are now live with $2,000,000 in prizes! As of today, version 3 is the world's only unsaturated agentic intelligence benchmark. Humans score 100% and frontier AI scores ~0%. Play here: arcprize.org/arc-agi/3 While no single version of ARC is definitionally AGI, our aim with the ARC-AGI Series is to continually produce useful scientific benchmarks which identify large remaining gaps between Humans and Frontier AI. At some point, we'll be unable to, and then we'll have AGI. Our new benchmark consists of over 100 novel game environments encompassing nearly 1,000 levels. Notably, test takers are given no explicit goals (other than to win) and must explore the environments to acquire goals, understand rules, develop strategy, and ultimately execute a plan to win. ARC-AGI-3 is a test of agentic intelligence. Beating this benchmark requires on-the-fly world modeling and continual learning to adapt to evolving environments. To score 100% AI must beat all of the games as efficiently as the human baseline (e.g., the number of actions taken to win). An ARC first, this gives us a formal comparison of AI reasoning efficiency vs humans. Version 3 carries classic ARC design principles: core knowledge priors only, private test sets to measure generalization, and it's fun! Every benchmark we release is an experiment and I believe this new version will provide strong signal towards increasingly autonomous AI agents. Prior versions of ARC held strong predictive power for important AI moments. Version 1 only saw progress with the release of AI reasoning models in late 2024 and Version 2 only began seeing progress with the advent of agentic coding models in late 2025. Version 3 is expected to signal when AI agents can become economically useful in more open-ended domains (beyond highly measurable domains like coding and math). There are a few other important design changes for ARC-AGI-3. The public set is now a "demonstration" set, not a training set. And unlike prior versions, the private set is now explicitly designed to be Out Of Distribution (non-IDD) from the public demo set. This is to mitigate targeting and because LLMs can now generalize over IDD splits using AI reasoning. Frontier models have made great progress over the past year. So much that several industry leaders have suggested we may already have AGI. Part of the ARC Prize Foundation mission is to provide accurate public sense finding and we strive to reduce false-positive claims. To this end, we've updated our testing policy. Going forward we will only verify scores outside of the official Kaggle competitions from AI systems with high commercial usage or are 100% open source. We're also adopting a stateless client scoring philosophy to ensure humans and AI are tested under identical conditions. The goal of these changes is to reduce the amount of developer-aware targeting (whether incidental or intentional) and provide clear signal if actual AGI progress has occurred. The Foundation also has a goal to inspire AI innovation which is most likely to come from the community. We've seen dozens of startups using ARC as a tool for showcasing their ideas - a few have fundraised serious capital based on their ARC results. To support this we're launching a new Community leaderboard. While scores for this leaderboard can't be Verified, and you should explicitly not trust these scores as an accurate measure of AGI progress, we will curate the best ideas and promote them. This year I expect we will see rapid progress on the ARC-AGI-3 Community leaderboard and the best ideas will eventually migrate into frontier models and onto the Verified leaderboard. Finally, we’ve partnered again with Kaggle to run two competition tracks for ARC-AGI-2 and ARC-AGI-3. This will be the last year for Version 2. When we launched the first ARC Prize back in 2024, I committed to running the Grand Prize until it was beaten. So for the ARC-AGI-2 track we will be paying out the Grand Prize to the best team, no matter what, in order to honor this commitment. In accordance with the Foundation mission, to win any prize money you must open-source a reproducible solution. We raised the standard for open source to include training. I'm excited to produce a truly open solution as a final send off for the ARC-AGI-1 and 2 format. Focus is now on ARC-AGI-3 (we've even started work on Versions 4 and 5). As always, I'm honored to have the opportunity to steward attention towards AGI progress. I'm also super grateful to the incredible ARC Prize team - including our core engineers, game designers, and human testers - led by @GregKamradt without whom we would not have this incredibly useful benchmark. See you on the leaderboard!
English
11
22
150
17.5K
Ankur Goyal
Ankur Goyal@ankrgyl·
We sent this note to our customers to let them know that Braintrust has raised a new round of funding, and thank them for their support. While the money is exciting, our focus hasn't changed: we're building Braintrust to help our customers ship quality AI products. In 2026, AI is moving to production but teams have never had less conviction about what will fail next. Our customers are building AI products that serve millions and simply need to work. If Braintrust makes their lives easier and their products better, I know we are doing our job. Thank you to @ICONIQCapital for leading our Series B, and to @a16z, @GreylockVC, @basecasevc, and @eladgil for doubling down. Thank you to the Braintrust team for all the incredible work you've done over the past year. And thank you to our customers, who have made this growth possible.
Ankur Goyal tweet media
English
40
28
176
92K
Bryan Helmig 🍻
Bryan Helmig 🍻@bryanhelmig·
opus 4.6 feels faster but also does things that opus 4.5 wouldn't have, eg. confusing repos/paths, etc.
English
0
0
3
547
Bryan Helmig 🍻
Bryan Helmig 🍻@bryanhelmig·
@claudeai wishlist item for claude code: upon exit, print the `claude -r <uuid>` command to resume the session...
English
0
0
3
155
Bryan Helmig 🍻
Bryan Helmig 🍻@bryanhelmig·
kudos to @Ubiquiti for an increasingly rare product that works just fine locally -- even when the internet is down
English
0
0
1
358
Jordan Coeyman
Jordan Coeyman@acoyfellow·
don't all major code agents constantly fill the context window (and prune it, etc..)? ralph is a inverted approach to thinking about context, if i'm grasping it right. my typicla cursor flow: ask > plan > agent. feels reliable.. but i am the harness. i feel like i'm impeding progress. with a ralph loop and way to mentally manage the context, it feels like my job shifts into putting the harnesses (tests, proof oc concepts, reliable production tests) in place and letting the AI decide the details. no more complaing about "oh i'm so tired of hearing claude say 'you're absolutely right'!"... or even being mad at llms. i'm guilty no more of cursing. poor little ralphs is all they are. we've been trying to just overload context windows maybe. i'm sceptical enough to test it but bought in enough to know it's inevitable. x.com/MichaelArnaldi… is a great read.
English
1
0
1
64
Bryan Helmig 🍻
Bryan Helmig 🍻@bryanhelmig·
the primary reason ralph wiggum is effective is because you're forced you to plan upfront
English
2
0
5
1.7K
Bryan Helmig 🍻 retweetledi
Wade Foster
Wade Foster@wadefoster·
Excited to see what people build with OpenAI's Agent Builder + Zapier MCP. Together you can build Agents that go beyond Chat and do work within your entire stack (Zapier MCP hooks into over 8,000 applications). A few early favorites 👇 * Streamline Onboarding: Use secure People Team credentials to create accounts, assign training in your LMS, and schedule manager check-ins automatically. * Improve Team Engagement: Summarize sentiment from your survey tool, draft follow-up actions, and send those directly to managers with talking points. * Speed Up Customer Support: Triage tickets, enrich them with CRM data, and route alerts to the right team channel. * Streamline Procurement: Validate spend requests, check budgets in your ERP, and push requests to approvers. * Weekly Project Reporting: Pull from project tracking updates, code commits, and design files to generate polished summaries. What will you build with Agent Builder + Zapier MCP?
English
4
4
40
14.6K
Bryan Helmig 🍻 retweetledi
Wade Foster
Wade Foster@wadefoster·
Every new thing launched at ZapConnect: First up: Unified Copilot. Type what you want, and it builds across Zaps, Tables, Interfaces, and Agents: - Workflows in minutes (not days) - Context flows everywhere (no rework) - AI does the wiring so you focus on strategy (Try it here: zpr.io/7BvrbugqNBze) We also shipped: 1. Human-in-the-loop 2. Over 30 new AI apps: Perplexity, Mistral, Cursor, DeepSeek + more 3. Agent Sharing: Build once, deploy to your whole org 4. Enterprise controls your IT team will actually like Oh and Tables + Interfaces are now included in every @Zapier plan. Missed the event? Replay’s up now.
English
3
10
47
21.4K
Bryan Helmig 🍻 retweetledi
Wade Foster
Wade Foster@wadefoster·
Zapier MCP is now available inside ChatGPT. That means you can connect to 8,000 apps+ and trigger Zapier workflows, just by writing what you want to happen. @ChatGPTapp + @Zapier MCP will figure out the right tools for the job and run the actions for you. Getting started is easy: 1. Head to the Zapier ChatGPT MCP Server and add the tools you want ChatGPT to access 2. Follow the steps in the “Connect” tab 3. If you're an admin on a ChatGPT Business or Enterprise account, you’ll see extra steps there to enable MCP across your workspace Try it here: mcp.zapier.com/mcp/servers?cl…
GIF
English
8
5
34
3.3K
Bryan Helmig 🍻
Bryan Helmig 🍻@bryanhelmig·
zapier agents can work as a team now. killer feature! also, we're hiring! links in thread.
English
3
1
7
510
Bryan Helmig 🍻
Bryan Helmig 🍻@bryanhelmig·
a couple notable points about gpt-5 launch today: * cached token pricing for gpt-5 is 10x cheaper v. ~4x in 4.1/o3/etc. * introduces context-free grammars (!!!) as an alternative to JSON in tool calling * more: preambles to tool calls, "minimal" reasoning, restricting to N tools
English
0
0
2
305