Kenneth Ballenegger

0

1

86

Kenneth Ballenegger@kob·22h

Codex GPT 5.5 as a coding model is basically as good as Claude these days, bit Claude has a much better harness. What drives me crazy with both is when it constantly blocks everything I try to do for "cyber risk" — it's getting dystopian to have big brother deciding what I'm allowed to code.

English

1

38

Mau “Rules without Rulers” Ledford@krunkosaurus·1d

Codex is a capable, all-you-can-eat buffet, while Claude Code is being more restricted every day, with Anthropic secretly nerfing models/settings and now charging you for agent use at 10x API rates. Their MO seems to be to offer you less, while charging you more as time passes. Claude is a better product ran by a worse CEO and logistics planner but better tech geek. Access to GPU’s is what Anthropic failed at and continues to handle it in the most shifty way possible. The XAI deal was crazy esp with the smack Elon talks about “Misanthropic” but hopefully this leads Claude on a better, more stable path. Competition is important.

English

2

0

1

115

Kenneth Ballenegger@kob·22h

@krunkosaurus Yup that's usually the issue. I have a shortcut to disable and enable DNS override for this very reason. But I might've just tethered on my phone these days, easier than dealing with these frustrating hotel wifi.

English

1

24

Mau “Rules without Rulers” Ledford@krunkosaurus·1d

Cool "local LLM saved me" story: I'm at a hotel in a remote country and the wifi works on my phone but the hotel's Captive Portal auth just won't load on my laptop. Just can't get it to trigger. There is literally no one that can help you. With no internet, I crack open OpenCode + Qwen3 27b 8-bit via LMStudio and it begins debugging. Says wifi is operational but DNS is blocked for some reason so it can't trigger Captive Portal. It proceeds to find the Captive Portal local IP address and adds to my etc hosts the override: 12.35.64.44 hotel.wifi.network And boom. I'm auth'd and online. When I get access to Codex I tell it to debug it further and turns out I have 1.1.1.1 global DNS override in my wifi settings which hotel wifi was blocking. Local LLM saves the day.

English

2

0

2

344

Kenneth Ballenegger@kob·1d

I wanted to build a hotel booking search system. The obvious version is: give Claude a browser, a few APIs, web search, maybe some credentials, and ask it to figure things out. It runs searches, opens tabs, compares results, retries when pages break, and eventually comes back with an answer. I don't like that architecture at all. The way I think about this is different: the agent shouldn’t be “doing travel search.” The agent should be using a travel search system. So the durable part is code. There’s a CLI for search jobs. A hotel search becomes a structured job: destination, dates, constraints, loyalty programs, required filters, output format. The system fans out across providers in parallel, normalizes results, validates invariants, records logs, and produces a reviewable output. No language model in the search loop. No vibes-based browsing. No “the agent clicked around and thinks this is the best option.” The LLM’s job is at the boundary: - understand the natural-language request - pull missing details from context when appropriate (e.g. "search hotels for my next trip" — it knows what my next trip is) - translate that into a precise query - call the deterministic system - summarize the result for me That’s the pattern I keep coming back to. Agents are great at intent, context, and judgment. Code is better at retrieval, validation, retries, parallelism, logging, and repeatability. A lot of “agentic automation” becomes much more reliable once the agent stops being the worker and becomes the interface to systems that are built to do the work.

English

27

Kenneth Ballenegger@kob·2d

I've been playing with this lately and it indeed is the best way to do video gen right now. My advice is to install the sogni creative agent skill into your Hermes/OpenClaw and use it from your own process — best UX vs using a bunch of website and copy pasting things all over.

Sogni.ai@Sogni_Protocol

The best AI storyboard-to-video workflow on the internet right now is GPT Image 2 → Seedance 2.0. We just made it the default at chat.sogni.ai - connecting with ByteDance and OpenAI to bring both into one chat-first creative engine. Type an idea. Get a storyboard. Get cinematic video. Refine. Ship the ad concept. Minutes, not days. Two things make Sogni different: → No subscription. Higgsfield bills you $29/mo whether you generate one clip or a hundred. Sogni is pay-per-generation on a people-powered consumer GPU network - frontier models when you need them, open-source models like LTX2.3 when you don't. → Same workflow, from your agent. Tell Claude, Codex, or Hermes to "make me a 6-shot storyboard and turn it into video" - the Sogni Creative Agent Skill runs the whole pipeline. chat.sogni.ai - try it free, no card.

English

0

2

94

Kenneth Ballenegger@kob·2d

One tidbit that exemplifies this is that my favorite new feature that Hermes released was no_agent cron jobs. I have 50+ cron jobs and once they released that feature all but five were immediately switched to no_agent jobs

Kenneth Ballenegger@kob

The more I build Klaw, the more I think durable agents are codebases with language interfaces, not prompts with tool access. The default agent pattern still feels like: Write a big prompt. Give the model a pile of tools. Hope it reasons through the workflow correctly every time. That works for demos. It’s not a great foundation for anything you want running every day. For repeatable work, the agent should not be rediscovering the procedure from scratch. It should be calling something known. A script. A CLI. A queue worker. A typed adapter. A deterministic parser. A database query. A narrow classifier. A job with logs, retries, validation, and boring failure modes. Then the language model does the part it’s actually good at: summarizing messy inputs, drafting text, classifying ambiguous cases, ranking options, explaining results, or choosing between known paths. This sounds less magical. I think it’s much closer to how useful personal agents actually work. Code gathers the data. Code validates it. Code computes the numbers. Code checks source-of-truth state. Code handles retries and side effects. Then, when needed, the model gets a narrow job: “Summarize this.” “Classify this into one of these categories.” “Explain these options.” “Draft the reply, using this evidence.” “Choose the next step from this list.” That split matters. If an LLM is doing the math, checking the state, deciding which source of truth matters, and writing the final answer all in one big mushy pass, you’ll eventually get weird failures. If code computes the answer and the LLM explains it, the system is much easier to trust. Same for email, travel, finance, contacts, reminders, dashboards, approvals. Basically anything personal enough that being wrong is annoying or expensive. The job of the agent is not to be clever at every step. The job is to know which parts should be deterministic and which parts need judgment. This also changes how “memory” works. A prompt-first agent wants to stuff more context into the model. A code-first agent asks: Where is the source of truth? What query should retrieve it? What is the minimum useful context? What evidence should be attached to the result? What should be logged so we can debug this later? That is a very different product. It’s cheaper. It’s faster. It’s easier to test. It’s easier to audit. It fails in more obvious ways. And when something breaks, you fix the primitive instead of rewriting vibes into a longer system prompt. Natural language still matters. A lot. The whole point is that I can ask Klaw for an outcome in plain English, and it can assemble context, choose the right workflow, run it, and explain what happened. But once a pattern repeats, it should graduate out of prompt-land and into code. That’s where agents start becoming infrastructure instead of chat sessions. And there’s a second-order effect I think people underweight: once the agent is not just a pile of prompts, you can build real software on top of it. The interface does not have to be a chatbot. It can be chat. It can be a mobile app. It can be a dashboard. It can be a button. It can be a background job that just does the thing. All the automations, permissions, adapters, databases, logs, and weird little connections between systems already exist underneath. The parts that need judgment can still call the LLM through the harness. But the product surface can be whatever makes sense. That’s the part I keep coming back to. The best personal agents will probably feel conversational at the edge and boring underneath. And once they’re boring underneath, they stop being just agents. They become a way to build software.

English

1

51

Kenneth Ballenegger@kob·2d

The more I build Klaw, the more I think durable agents are codebases with language interfaces, not prompts with tool access. The default agent pattern still feels like: Write a big prompt. Give the model a pile of tools. Hope it reasons through the workflow correctly every time. That works for demos. It’s not a great foundation for anything you want running every day. For repeatable work, the agent should not be rediscovering the procedure from scratch. It should be calling something known. A script. A CLI. A queue worker. A typed adapter. A deterministic parser. A database query. A narrow classifier. A job with logs, retries, validation, and boring failure modes. Then the language model does the part it’s actually good at: summarizing messy inputs, drafting text, classifying ambiguous cases, ranking options, explaining results, or choosing between known paths. This sounds less magical. I think it’s much closer to how useful personal agents actually work. Code gathers the data. Code validates it. Code computes the numbers. Code checks source-of-truth state. Code handles retries and side effects. Then, when needed, the model gets a narrow job: “Summarize this.” “Classify this into one of these categories.” “Explain these options.” “Draft the reply, using this evidence.” “Choose the next step from this list.” That split matters. If an LLM is doing the math, checking the state, deciding which source of truth matters, and writing the final answer all in one big mushy pass, you’ll eventually get weird failures. If code computes the answer and the LLM explains it, the system is much easier to trust. Same for email, travel, finance, contacts, reminders, dashboards, approvals. Basically anything personal enough that being wrong is annoying or expensive. The job of the agent is not to be clever at every step. The job is to know which parts should be deterministic and which parts need judgment. This also changes how “memory” works. A prompt-first agent wants to stuff more context into the model. A code-first agent asks: Where is the source of truth? What query should retrieve it? What is the minimum useful context? What evidence should be attached to the result? What should be logged so we can debug this later? That is a very different product. It’s cheaper. It’s faster. It’s easier to test. It’s easier to audit. It fails in more obvious ways. And when something breaks, you fix the primitive instead of rewriting vibes into a longer system prompt. Natural language still matters. A lot. The whole point is that I can ask Klaw for an outcome in plain English, and it can assemble context, choose the right workflow, run it, and explain what happened. But once a pattern repeats, it should graduate out of prompt-land and into code. That’s where agents start becoming infrastructure instead of chat sessions. And there’s a second-order effect I think people underweight: once the agent is not just a pile of prompts, you can build real software on top of it. The interface does not have to be a chatbot. It can be chat. It can be a mobile app. It can be a dashboard. It can be a button. It can be a background job that just does the thing. All the automations, permissions, adapters, databases, logs, and weird little connections between systems already exist underneath. The parts that need judgment can still call the LLM through the harness. But the product surface can be whatever makes sense. That’s the part I keep coming back to. The best personal agents will probably feel conversational at the edge and boring underneath. And once they’re boring underneath, they stop being just agents. They become a way to build software.

English

3

188

@jinnykang Too many interesting things to talk about to stay quiet these days. We live in exciting times! It's gone to build again.

English

1

18

Jinny@jinnykang·3d

@kob Yes!!! You’re back!

English

0

1

17

I've spent the last few months building the best AI agent imaginable. It very much goes against conventional wisdom — the abstraction layers popularized here on X don't make that much sense to me. I've developed a few components that deserve to be open sourced. So starting today I'm going to start sharing a bit more about it. Stay tuned 🦅

English

0

3

281

What I built with Klaw 🦅 Klaw started as a personal AI agent, but it has grown into a private operating system for my life: part executive assistant, part travel desk, part finance system, part media brain, part coding team, part household computer, and part creative studio. The coolest pieces: - Cryptographic approvals for safety: Klaw can take real actions, but dangerous operations go through signed approval flows. Even if the agent goes rogue, it can’t just send emails, approve actions, or mutate important systems without cryptographic authorization. - Travel brain: one integrated travel system for flights, hotels, points, award searches, loyalty accounts, promos, folios, flight status, gates, terminals, hotel contacts, upcoming-stay alerts, and timezone/location awareness. It knows where I am, what trip is next, what points I have, and what travel context matters. - Email + work intelligence: Klaw reads, triages, labels, archives, drafts, and routes email across my personal and work life. It handles dealflow, investor updates, forwarding instructions, document imports, and approval-gated outbound replies. - Mind Project: a personal knowledge graph for my life. It stores ideas, references, lists, plans, docs, and project context — and acts as the brain behind everything Klaw does. Every feature, data source, and personal workflow can feed into or draw from Mind. - Household butler + smart kitchen: a WhatsApp-based household ops system that processes receipts, tracks purchases, manages whiskey and wine cellar inventory, maintains shopping lists, and powers an iPad app that lives on my fridge as a smart kitchen interface. - Finance + trading systems: real-time credit card monitoring, spending tracking, subscription detection, financial notifications, and automated market/trading systems including Polymarket scanning, analysis, and execution logic. - Media brain: a private media server streaming to all my devices over Tailscale, plus TV/show tracking, episode discovery, concert and festival tracking, favorite-artist monitoring, and event discovery. - Personal CRM: tracks people, contact details, interactions, relationship context, and follow-up cadences. It can ingest people from emails, signatures, or business cards. - Chinese tutor: a dedicated WhatsApp tutor with vocab drills, spaced repetition, quiz grading, word-of-the-day, and conversation practice. - Document + investor update vault: automatically detects signed documents, archives them, classifies personal vs work docs, and ties into Utopian workflows like automatic investor update importation. - Creative generation studio: generates images, video, voice, text-to-speech, diagrams, and cloned/personalized voices using multiple backends, including local and remote models. - Local model infrastructure: we host local models and services for voice, vision, language, media, and other private AI workloads. - Coding minions: Klaw can spawn background Claude Code agents to build features and debug systems as needed — all in parallel. - Dashboard + mobile OS interface: every major feature shows up in a first-class web dashboard and mobile app. It’s the operating-system interface for the whole personal automation stack. - Operations layer: health checks, service monitoring, cron jobs, launch agents, database backups, private networking, Tailscale routing, and deployment workflows keep the whole thing running. - Web deployment engine: Klaw also builds, runs, and deploys dozens of websites and internal apps across personal, work, and creative projects. The interesting part isn’t any single feature. It’s that all of these systems talk to each other: email feeds travel, travel feeds reminders, finance feeds household ops, Mind gives context to everything, approvals keep actions safe, local models keep private workloads close, and coding minions can extend the system itself. Klaw isn’t just a chatbot anymore. It’s a private AI operating system for my life.

Kenneth Ballenegger@kob

I've spent the last few months building the best AI agent imaginable. It very much goes against conventional wisdom — the abstraction layers popularized here on X don't make that much sense to me. I've developed a few components that deserve to be open sourced. So starting today I'm going to start sharing a bit more about it. Stay tuned 🦅

English

2

103

The more I build Klaw, the more I think personal agents need boring operating-system primitives more than they need better prompts. A chatbot can answer a question. An agent has to live in messy reality. Emails arrive out of order. Credentials expire. Websites change flows. Background jobs fail halfway through. Some actions need approval. Some tasks should never run twice. Private data needs to stay private. That changes the whole design. The hard part is not making the model sound smart. The hard part is making the system safe enough to operate over time. A few things I keep coming back to: Human approval boundaries. The agent should draft, prepare, check, summarize, recommend — and then stop when the next step is sensitive. Publishing, spending money, sending messages, deleting data, changing state. Those should usually have an explicit approval point. That is not a UX tax. It is the trust layer. The goal is not "AI does everything." The goal is "AI takes the work as far as it safely can, then gives the human the right decision at the right moment." Fail-closed behavior. If something is ambiguous, the agent should not improvise. If a parser is unsure, a login flow changed, a transaction looks weird, or a source conflicts with another source, the right behavior is often to stop. Not guess. Not hallucinate. Not push through because the demo would look better. Stop, preserve the evidence, explain the block, ask if needed. Queues and ordering. A lot of personal automation is event driven: emails, receipts, reminders, calendar changes, alerts, messages. Once you have more than one thing happening at once, you need boring stuff: durable queues, retries, idempotency, logs, ordering. Otherwise "smart automation" becomes race conditions with a friendly interface. Provenance. If the agent summarizes something, updates a dashboard, imports data, or takes action based on a source, it should know where the information came from. Not because every personal system needs enterprise compliance. Because future-you will need to debug it. After a few months, provenance becomes memory. Private data minimization. The most useful personal agents touch the most sensitive data: inbox, travel, money, contacts, family, location, documents. So the architecture has to minimize what gets exposed, logged, committed, or sent to tools. Redaction is not a cleanup step. It is part of the product. Recovery. If an agent becomes part of your daily operating layer, backups are no longer just ops hygiene. Can you restore the memory? The queues? The configs? The audit trail? The local state? It is not enough that the code is in git. The agent’s lived context matters too. This is where I think the personal AI conversation is still too demo-driven. The demo is: "look, the agent booked something." The product question is: can it handle the 500th booking, the weird edge case, the duplicate event, the failed login, the approval boundary, the rollback, and the restore? That is the difference between an impressive assistant and a dependable personal operating system. Klaw is making me more convinced that the future of agents is not just more autonomy. It is bounded autonomy. More capable where the system has confidence. More conservative where the stakes are high. More observable when something breaks. More careful with private context. More willing to hand control back to the human. The best personal agents probably will not feel like magic all the time. They will feel like infrastructure that occasionally uses magic.

Kenneth Ballenegger@kob

I've spent the last few months building the best AI agent imaginable. It very much goes against conventional wisdom — the abstraction layers popularized here on X don't make that much sense to me. I've developed a few components that deserve to be open sourced. So starting today I'm going to start sharing a bit more about it. Stay tuned 🦅

English

1

55

Kenneth Ballenegger@kob·8 May

Huge congratulations to Kevin, Daren, and the entire Reap team on this incredible outcome! As one of their earliest investors, we saw from the beginning that they had the ambition and execution to build something globally relevant from Hong Kong. Big startup wins from Hong Kong are still too rare, which makes this milestone especially meaningful for the local ecosystem. Excited to see what Reap builds next with Payward / Kraken.

Reap@reapglobal

Reap is joining @Payward, @krakenfx's parent company. We started Reap in 2018 with a conviction: stablecoins will be core infrastructure for global payments. Now, with the Payward ecosystem, we’ll move faster toward a more open, continuous global financial system. Same team. Same product. Bigger reach. We’re just getting started. Reap to the moon 🚀 Learn more 👉 reap.global/newsroom/paywa…

English

120

Kenneth Ballenegger retweetledi

Reap@reapglobal·7 May

Reap is joining @Payward, @krakenfx's parent company. We started Reap in 2018 with a conviction: stablecoins will be core infrastructure for global payments. Now, with the Payward ecosystem, we’ll move faster toward a more open, continuous global financial system. Same team. Same product. Bigger reach. We’re just getting started. Reap to the moon 🚀 Learn more 👉 reap.global/newsroom/paywa…

English

8

7

80

5.9K

Kenneth Ballenegger@kob·4 May

Built a new little project called The Travel Signal (thetravelsignal.com): a travel-hacking news aggregator that watches the main points blogs, filters out the credit-card promo fluff, and summarizes only the actual useful news. If multiple blogs cover the same thing, it merges them into one concise brief with links to the original sources. Basically: travel loyalty news, without the noise. Still MVP, but it’s live. Let me know what y’all think.

English

4

561

Kenneth Ballenegger@kob·20 Mar

Gave my AI agent email access this week. Spent more time thinking about what it shouldn't be able to do than what it can. Wrote up the architecture — zero trust, cryptographic approval gates, and why the UX of the approval flow is part of the security design. gist.github.com/kballenegger/0…

English

0

4

131

Kenneth Ballenegger@kob·18 Şub

@RazvenHK @grok what does the full article say

English