Ben Callow💹🧲

1.2K posts

Ben Callow💹🧲

@ben_callow

Founder @keystone_group1 — AI-native automation consultancy for UK SMEs. Building in public.

England, United Kingdom Katılım Nisan 2013

933 Takip Edilen369 Takipçiler

Ben Callow💹🧲@ben_callow·1d

@misfitwriter_ Absolutely, tested all sorts of strategies for months. Bonkers that the simplest change is what had the greatest impact. The metrics dont lie though!!

English

A.Misfit@misfitwriter_·2d

@ben_callow Have you done this yourself?

English

A.Misfit@misfitwriter_·2d

You can post every day for a year. Build in public. Build in silence. Doesn’t matter! If the market lands on your profile and feels nothing specific— You’re not building a brand. You’re building a posting habit. Here’s the truth nobody wants to hear: Consistency alone doesn’t create trust. Perceived authority does. And right now, most creators look identical. Same content. Same hooks. Same CTAs. Same recycled “value.” So when buyers land on your page— they don’t see category authority. They see another option. That’s not a content problem. That’s a perception architecture failure. The market doesn’t buy the most consistent creator. It buys the one it already trusts before the DM ever starts. Meaning: You are likely losing clients before conversations even exist. Not because your offer sucks. Not because you aren’t posting enough. But because your brand fails to create immediate trust differentiation. Consistency kept you active. Perception would have made you money.....

English

568

Ben Callow💹🧲@ben_callow·2d

x.com/i/article/2050…

ZXX

Ben Callow💹🧲@ben_callow·2d

The part that surprises teams: the policy itself becomes a dataset. After 30 days you can see exactly which task classes never needed the escalation and drop the cost threshold. The ceiling is a starting point, not a setting.

English

Ben Callow💹🧲@ben_callow·2d

Your LLM bill is a mystery because you buried the decision in app code. Hard-code ‘use Model X’ into a workflow and you get three problems: no cost ceiling, brittle fallbacks, and quality tweaks that require a deploy. The bill only becomes ‘visible’ after finance asks why it spiked. Every run is a tiny purchasing decision: which model, how many tokens, what retries, what context size. Multiply that by a few thousand runs a month and the difference between ‘good enough’ and ‘best every time’ is no longer a rounding error. Here’s the routing policy I push teams to implement as a first-class layer (I call it the Policy Plane): 1) Put a price tag on every run (pull live model pricing so each request has an estimated £ cost). 2) Classify tasks by risk: low-risk summarise/extract, medium-risk draft, high-risk client-facing. 3) Start cheap by default, then escalate only on failure signals (low confidence, missing fields, guardrail trip). 4) Set hard ceilings: per-run max £, daily budget, and a ‘degrade mode’ when you hit limits. 5) Build a fallback chain you can change without redeploying: primary → secondary → ‘human review’. In n8n this is finally practical: the Model Selector node gives you the routing point, and OpenRouter pricing can be pulled into the workflow so you can compute cost per request before you commit. n8n even published a template that compares token costs across hundreds of models using OpenRouter pricing (n8n.io/workflows/1210…) and there’s a solid walkthrough of Model Selector patterns here (automategeniushub.com/guide-to-use-t…). If you can’t explain, in one sentence, why a given run picked a given model, you do not have an automation, you have a spend generator. Model choice is not a product decision once you run automations at scale, it's an operations policy and it belongs in your automation layer. The boundary is the whole game: decide once (in the workflow), then tune the policy as you learn. Want the automation audit template we use with UK SME clients? It's free. Reply AUTOMATE below and I'll DM you the free audit template. #UKBusiness #AIAutomation

English

Ben Callow💹🧲@ben_callow·2d

@ferz_erz00 @Wayforthio Idempotency keys + a dead-letter queue beat a “smarter” agent. Most failures are retries, 429s, and partial writes, not reasoning. If the tool layer doesn’t own on-call + audit logs, it’s a demo.

English

𝔽𝔼ℝℤ (♞,♞)@ferz_erz00·2d

I stopped thinking the problem with agents was intelligence. at some point they were already “smart enough” to do most of the tasks we keep giving them; they can reason, plan, break things down, even recover from mistakes. but the moment you try to make them do real work, like connecting to an API, handling keys, or pulling anything from outside their bubble, everything slows down in a way that has nothing to do with the model itself. every time an agent needed to actually do something useful, it had to step out of itself and enter this messy layer of the internet, APIs that don’t speak the same language, keys that expire, dashboards, docs, random onboarding flows that break the moment you scale anything. and the strange part is, nobody really questions it anymore. We just accepted that “real work” for agents means rebuilding the same plumbing over and over again. that’s where @Wayforthio starts to feel different. not because it adds more intelligence, but because it removes the need to keep rebuilding access to the world. you don’t have to go hunting for tools anymore, you simply describe what you need in plain language and the system pulls back real services that already exist, already tested, and continuously checked so the ones that fail don’t just sit there pretending to work. Then it doesn’t stop there. You pick what fits, and the same flow handles payment and execution without forcing you into a separate setup step. Whether that payment comes from a card or directly from a wallet, the agent doesn’t have to care, it just moves forward. no jumping between systems just to complete one simple action. It stays inside one continuous path from intent to outcome. and over time, it starts to get sharper in a way most people won’t notice immediately, because the services that actually get used and paid for naturally rise, while the ones that just look good on paper fade out. What that actually does is hard to ignore once you see it, because most of what we call “agent workflows” today is just fragmentation disguised as automation. A chain of tools stitched together manually, held in place by the developer every time something breaks. Wayforth removes a big part of that stitching work. So instead of building around integrations, you start building around outcomes again. the agent focuses on what it needs to do, and everything underneath that layer becomes something it can just pass through instead of manage. that shift is subtle at first, but once you notice it, the old way of wiring everything together starts to feel heavier than it should have ever been.

English

119

1.1K

Ben Callow💹🧲@ben_callow·2d

@Col_ASY Build in public works when it’s customer-facing documentation, not founder applause. If 80% of your followers are builders, you’ll optimise for likes. One buyer CTA + one ICP proof per post beats “looks clean” all day.

English

Ayush Barnwal | Building@Col_ASY·2d

Likes from other founders: 400 Comments saying "Looks clean!": 50 Actual paying customers: 0 The "Build in Public" timeline is a great support group It is a terrible distribution channel. Stop marketing to your peers.

English

300

Ben Callow💹🧲@ben_callow·2d

@revoca_ai Exactly the failure mode that's hardest to see coming. Curious how you capture the "why we did it this way" — that reasoning rarely makes it into any doc, and it's usually where the real gap is.

English

Revoca AI@revoca_ai·3d

This is why exactly we are building. We are unifying all sources and making sure that employees on leave don't block others, people leaving don't create knowledge gaps and the top leadership has complete insight of company. For end user, we are smoothening onboarding, support and operations.

English

Ben Callow💹🧲@ben_callow·3d

This week we had to reroute tasks twice after a model change. Not because the work changed, but because routing logic was scattered. That’s the quiet cost of putting model routing inside app code: every model swap becomes a mini integration project, and your retry rates climb while you hunt down where the decision is being made. My current take for UK SMEs building agentic automation: make your automation layer (n8n) the canonical owner of model routing. What I mean by ‘model routing’ - choosing which model handles which task (classify, draft, extract, critique) - when to fall back to a cheaper or more reliable model - when to refuse and ask for more context Why n8n is the right place 1) One place to change behaviour. Swap a model once, not across three services and a queue worker. 2) Observability becomes real. You can log: input → route decision → model → output → retries, in one workflow. 3) Governance is simpler. You can enforce rules (PII handling, prompt versions, allowed tools) where the work is orchestrated. 4) You can add reliability patterns without a rewrite: canary routes, timeouts, circuit breakers, and ‘try local RAG first’ before you even hit a big model. Trade-offs (it’s not magic) - If you’re doing ultra low-latency, embedded routing in code can still make sense. - If your team treats n8n as a toy, you’ll recreate chaos. Workflows need versioning, reviews, and metrics like any other system. A practical first step Pick one noisy workflow and move just the routing decision into n8n: add a router node that selects the model, logs the decision, and tracks retry rates before/after. You’ll learn fast where the failure modes really are. If you’re building with n8n + local RAG, I’m curious: where does your routing logic live today, and what’s your current retry rate? #n8n #AIAutomation #AgenticAI

English

Ben Callow💹🧲@ben_callow·3d

@borjafat SEO

borja@borjafat·3d

Claude Opus 4.7 or GPT-5.5 can run your ENTIRE SEO Tired of chasing backlinks? Solved. Tired of AI slop articles? Solved. Tired of ranking for keywords that don't convert? Solved You just install a SKILL and you're ready to roll. Can run automated daily tasks on a schedule Comment "SEO" and I'll send it!

English

883

599

63.4K

Ben Callow💹🧲@ben_callow·3d

@matthewclifford @join_ef Cool research — but I want to see the first boring back-office workflow it runs end-to-end without supervision. That's the real test.

English

Matt Clifford@matthewclifford·3d

This is a very special team - one to watch. Galen, Devansh and co are doing something extremely ambitious. Feel very lucky to have backed them from day one with @join_ef

Standard Intelligence@si_pbc

We’ve raised 75m in new funding from Sequoia and Spark Capital—partnering with @sonyatweetybird, @MikowaiA, and @YasminRazavi, all of whom are deeply supportive of our long-term mission. We’ve also brought on angels & advisors including @karpathy, @tszzl, and @_milankovac_. ----- Our early results with FDM-1 moved computer use from a data-constrained regime to a compute-constrained one; this latest round of funding unlocks several orders of magnitude of compute scaling for that work. With the FDM model series we have a path to scale agentic capabilities through video pretraining, and we expect to achieve superhuman performance on general computer tasks in the same way that current language models have superhuman performance on coding tasks. We’re also now able to invest in the blue-sky research necessary to our long term mission of building aligned general learners. To realize the civilizationally transformative impacts of AI, models must generalize far out of their training distributions, actively exploring and building skills in new environments. This capability represents a substantial shift from the current paradigm of model training. We believe that current alignment techniques are insufficient to predictably and safely steer a model with human-level learning capabilities, and so we’re doing work to study small versions of this problem in controlled environments to develop a science of alignment for general learners. We’re a team of 6 people in San Francisco. We’re hiring world-class researchers and engineers to help us achieve our mission. If that’s you, please get in touch.

English

9.6K

Ben Callow💹🧲@ben_callow·3d

@n8n_io Ship one workflow with guardrails, logging, and ROI tracking, then scale — otherwise it's demo theatre.

English

246

n8n.io@n8n_io·3d

n8n's official Claude Code connector can now create and edit workflows! This goes way beyond plugging an API into MCP. It's purpose built for LLMs. Includes a new workflow TypeScript SDK so workflows are written as code instead of JSON, with more reliable validation. Works anywhere MCPs are supported (n8n 2.18.5+). 🔗 Full video: bit.ly/42Gi0VO

English

725

113.5K

Ben Callow💹🧲@ben_callow·3d

x.com/i/article/2049…

ZXX

Ben Callow💹🧲@ben_callow·3d

@recap_david Pulse

English

David Roberts@recap_david·5d

I built a zero-person AI newsletter business that did $2,000+ in revenue last month. No team. No payroll. No freelancers. Just 4 AI agents running the entire operation (and I spend less than 4 hours a week on it). Here's how the system works: → A CEO agent sets the vision and orchestrates every hire → A Growth Engineer scrapes local news, Reddit, and event venues into a daily JSON database → A Content Director reads that database, curates the best events, and writes every Thursday newsletter in my voice → A Sales Director fields every ad lead, generates ad creative with nano banana, and closes deals over email → All orchestrated through Paperclip AI & powered by Claude Code Spokane Pulse (my local newsletter) now has 6,662 subscribers and a 47.5% open rate, almost double the industry average. Local newsletters are quietly printing money. Naptown Scoop does $320K/year. Wichita Life clears six figures. The model is wide open in almost every city, especially when building it in an AI-native way. If you want the full blueprint and step-by-step walkthrough video, Like, RT, and comment "PULSE" (must be following so I can dm you) I'll send you the exact Paperclip AI company export I use to run Spokane Pulse. You can clone it, swap in your city, and ship.

English

612

277

868

63.1K

Ben Callow💹🧲@ben_callow·4d

Most teams don't have an AI problem. They have a document chaos problem. If your contracts live in email, your policies live in SharePoint, and your 'how we do it' knowledge lives in someone's head, your chatbot will hallucinate. Every time. We keep seeing the same failure mode in UK ops teams: they buy a model, point it at a folder, and hope it behaves. Then it confidently quotes last year's process as if it's current. The fix isn't a bigger model. It's a retrieval layer that forces the model to answer from your own sources, or say "I don't know". The Local RAG Loop (the version that actually holds up in production): 1) Ingest: pick 10–30 high-value docs first (contracts, SOPs, top 50 support tickets). Convert to text, strip headers/footers, and store source + version. 2) Embed: create embeddings once per document version, not per question. 3) Store: keep vectors in a local store (Qdrant works well) so search stays fast and auditable. 4) Retrieve: fetch top 3–8 chunks with metadata, then filter by recency/owner. 5) Generate: prompt the model with strict rules: cite sources, prefer "unknown" over guessing, and never answer without retrieved context. 6) Monitor: log every question, the retrieved chunks, and the final answer. Treat misses as product bugs. n8n is the glue that makes this workable without a bespoke app. One workflow can: watch a Drive folder, chunk and embed new files, upsert into Qdrant, and expose a Slack/Teams webhook that returns answers with citations. Peer detail that matters: chunk at ~400–800 tokens, store doc_id + version in metadata, and reject retrievals older than the doc owner's last review date. If you don't build retrieval first, you're not "adding AI". You're scaling guesswork. Most companies trying to "add AI" should build a retrieval pipeline before they touch fine-tuning, because it forces discipline in how knowledge is created and maintained. The messy edge case nobody mentions is scanned PDFs with no clear owner. That will break your first version. Building something similar? Reply with your biggest automation bottleneck. We read every reply. #UKBusiness #AIAutomation

English

Ben Callow💹🧲@ben_callow·6d

When your “AI agent” makes a bad call, it is your company that fails the audit. We learned this the hard way building agentic automations for a team that wanted more autonomy fast. The workflows ran. The outputs looked right. Then one edge-case hit (a duplicate customer + partial refund), and nobody could explain who approved what, what data the agent saw, or why the system trusted that step. Here’s the mechanism: agents don’t fail in the clever bits, they fail at the handoffs. Identity, permissions, and approval gates are usually stitched on after the demo. That’s fine until the agent touches money, customer data, or anything that shows up in an EU AI Act or NIST AI RMF conversation. So we started treating oversight like a muscle you train, not a policy you write. We call it the HITL Muscle Test: 1) Pick one workflow where a wrong action has a real cost (refunds, pricing, access, comms). 2) Add a hard pause at the riskiest step: human approval with a named owner and a 2-minute SLA. 3) Make the approval “replayable”: log who approved, what they saw, what they changed, and the exact input. 4) Run failure drills weekly: feed in messy cases (duplicate records, ambiguous emails, missing IDs) and score the operator decisions. 5) Only then increase autonomy, one step at a time, with rollback, and measure how often humans override. The detail that makes this work is identity-aware orchestration. Every action is executed as a scoped principal (OIDC, least-privilege tokens), and every approval is bound to a user identity, not “the system”. If you cannot reconstruct the chain in 10 minutes, you do not have automation, you have liability. If you want speed, train the humans first. Autonomy without oversight is just faster failure. If you cannot prove who approved an agent’s action and what they approved, you should treat that agent as a compliance risk, not a productivity tool. Took us three attempts to get this right, and the fixes were mostly process and identity, not model choice. This took us weeks to build. Took you 3 minutes to read. If it was worth it — repost it for someone who needs it. #UKBusiness #AIAutomation

English

Ben Callow💹🧲@ben_callow·6d

@JulianGoldieSEO I buy the ‘infra shift,’ but long context just moves the bottleneck to messy source-of-truth data and change control. 1M cheap tokens make it easier to build brittle automations faster unless you pair them with strong data contracts and human-in-the-loop approvals.

English

Julian Goldie SEO@JulianGoldieSEO·27 Nis

DeepSeek V4 just changed the AI market overnight. 1.6 trillion parameters. 1 million token context window. Open source access. Pricing so low it makes old AI workflows look expensive. The big shift is not just better answers. It is that huge automation systems now become affordable for creators, developers, and business owners. You can load full codebases, months of business data, entire content libraries, and massive research archives into one workflow. No messy chunking. No endless copy pasting. No breaking everything into pieces. This is where AI stops being a chatbot and starts becoming business infrastructure. DeepSeek V4 is not just another model update. It changes what is worth building with AI in the first place.

English

1.2K

Ben Callow💹🧲@ben_callow·6d

Local Slack search is great, but the win is governance: you need per-agent scopes and retention-aware indexing, otherwise you’ve just built a perfect leakage machine. I’m curious how you’d enforce ‘right-to-be-forgotten’ style deletion and role-based access when multiple agents share the same SQLite store.

English

Matthew Berman@MatthewBerman·26 Nis

Never lose a conversation again in Slack. NEW KIT: Search your entire Slack history from any agent or CLI. Syncs messages locally into SQLite with FTS5 for fast, private, workspace-specific search. journeykits.ai/browse/kits/ma…

English

4.3K

Ben Callow💹🧲@ben_callow·6d

The failure nobody mentions: confidence scores drift. A supplier changes their invoice layout, your threshold stays fixed, and docs silently misclassify. Build a weekly drift check. If mean confidence drops >10%, your prompts need updating.

English

Ben Callow💹🧲@ben_callow·6d

Your accounts don’t go out of control on the day you “miss” an invoice. They go out of control 6 weeks later, when the VAT return is due, the supplier is chasing, and you realise the numbers were wrong and no one can tell you why. We’ve been building invoice and receipt automations recently. The hard truth is this: extraction is rarely the hard bit. Control is. What actually breaks in the real world: - receipts with half the VAT line missing - supplier names that vary by branch - credit notes that look like invoices - PDFs that are scans, photos, or three pages of noise - “total” fields that are right, but the line items are nonsense Pure OCR gives you text, but not meaning. A pure LLM gives you meaning, but not guarantees. So we stopped treating docs like “inputs” and started treating them like untrusted data that needs an audit trail and a recovery path. The Guarded IDP Loop (5 steps): 1) Capture + fingerprint: store the raw file, hash it, and assign an idempotency key. 2) OCR for coverage: extract text + layout, keep coordinates, and keep the raw image for later review. 3) LLM normalisation: convert to a strict JSON schema (supplier, date, net, VAT, gross, currency, line items) with types and required fields. 4) Validate + score: run rules (VAT maths, date sanity, duplicates, PO match, supplier allowlist) and produce a confidence score. 5) Route + learn: auto-post only above a threshold, otherwise push to a review queue and log the correction so the next prompt gets better. Peer signal: in n8n, make validation a first-class node, write failures to an “exceptions” table, and track LLM spend per document (node-level usage) so you can cap cost before it caps you. Automate the happy path, engineer the recovery path. If your document automation can’t explain itself and recover cleanly, it’s not automation, it’s a hidden control failure. We rebuilt ours twice before we stopped trusting “looks right” outputs. Want the automation audit template we use with UK SME clients? It's free. Reply AUTOMATE below and I'll DM you the free audit template. #UKBusiness #AIAutomation

English

Keşfet

@misfitwriter_ @ferz_erz00 @Wayforthio @Col_ASY @revoca_ai @borjafat @matthewclifford @join_ef