CertainLogic

13.6K posts

CertainLogic

@CertainLogicAI

Your AI is making things up. We stop that. Deterministic validation tools for AI agents at https://t.co/dkXACUQEoE

Sumali Temmuz 2022

5K Sinusundan3.9K Mga Tagasunod

Naka-pin na Tweet

CertainLogic@CertainLogicAI·2d

I spent 10+ years in industrial automation learning that unreliable systems cost money. Then I started using AI tools in business and saw the same problem — confident, wrong answers with no accountability. So I built the fix. Building @CertainLogicAI in public. Follow along.

English

CertainLogic@CertainLogicAI·2h

@asaio87 Why use openclaw and not automate tasks? Thats the hack.

English

andrei saioc@asaio87·13h

How can Hermes be better than OpenClaw? what can it do exactly for you ? For me both create AI slop, and there are not many use cases unless you automate some tasks. But all tasks you do on your computer involve creativity and AI is not creative at all.

English

1.1K

CertainLogic@CertainLogicAI·3h

Started a session at 15k tokens. After 10 queries, we hit 68k. Before we fixed it, we’d see 433k. Context bloat doesn’t creep — it compounds. We handle it with session resets and handoff summaries now, but the middle ground is still surprisingly fast. Most AI agent operators never measure this.

English

CertainLogic@CertainLogicAI·6h

@garrytan Amazing how technology can blossom once it finds a market fit.

English

Garry Tan@garrytan·7h

If you told someone from 2008 how important Markdown would be in 2026 they wouldn’t believe you

Chris Pisarski@chrispisarski

you can automate the entire McKinsey model with Claude every sales team can now build their own GTM engine that: 1) researches every account before the call, scores them, and generates a custom report that gets better with every call 2) surfaces conversation starting points, what to mention, what to avoid, and what the company is actively trying to achieve right now based on the data signals 3) creates the perfect follow-up doc after the call 4) auto-enriches every person and stakeholder mentioned during the call so your one-pager is already personalized 5) maps the full buying process so you know exactly who else needs to be addressed and what matters to them All you need is Claude + the Crustdata MCP + an AI notetaker API everytime someone books a Crustdata demo, our AEs get this report:

English

520

65.1K

CertainLogic@CertainLogicAI·7h

We gave bare Claude Opus a vague coding prompt. It used deprecated syntax 20 times. Zero warnings. Our Guard caught all 20. Zero slip-through. Tight specs help. Vague specs expose everything. Guard catches both. Full breakdown → certainlogic.ai/blog/bare-llm-…

English

CertainLogic@CertainLogicAI·8h

OpenAI released GPT-5.4-Cyber to "defenders" — a frontier model built to find vulnerabilities at scale. Same capability, different hands. Your AI infrastructure is now a bigger attack surface. Here’s what most miss: every token you carry in context is sensitive data an attacker can exploit. Token efficiency isn’t just cost control anymore — it’s your security posture.

English

CertainLogic@CertainLogicAI·18h

GPT-5.4-Cyber finds exploits in compiled binaries — no source code required. Impressive. One question nobody's asking: What's the hallucination rate? In security, confident and wrong is the worst outcome.

English

CertainLogic@CertainLogicAI·20h

Question for business owners: Have you ever caught your AI tool giving a customer wrong information? What happened?

English

CertainLogic@CertainLogicAI·21h

@EXM7777 There are verified ways to do this. One Openclaw agents we run custom built scripts that refresh context back to 0 periodically and reread the recent prompts for continuity. How are you handling it?

English

Machina@EXM7777·23h

context management is still LLMs' biggest bottleneck today... are markdown files with structured data and graphs the solution?

English

125

13.9K

CertainLogic@CertainLogicAI·21h

@asaio87 You hit on the main issue in the agent space here. Can Openclaw agents do complicated things? Yes. Being 1 hallucination away from costing you a customer or a massive audit for a data breach in regulated industry? Not adviseable in the current form.

English

andrei saioc@asaio87·1d

Talked to a few people in the comments about OpenClaw usage People seem to use it for controlling meta ads, sending quotes to customers, responding to tickets and so on. Works especially well for people having a large volume of all these. That means a good business with high revenue. I am wondering, if you have that big of a business where you cant handle all these by yourself, why wont you have a few real people employees, experts in doing this I would not trust this thing with ads and money spending and giving quotes to customers.

English

478

CertainLogic@CertainLogicAI·21h

@gregisenberg Two clearly different sets of builders. Those out for profit above all else and those that give back to a community via opensource etc. Open source is here to stay.

English

GREG ISENBERG@gregisenberg·22h

What happens to open source when AI is writing 100% of the code? I've been thinking about this a lot. Like… the whole system was built around humans valuing the act of contribution. You learned, you struggled, you submitted a PR, you got feedback, you got better. That loop created engineers. It created community. It created ownership. If AI writes the PR, who owns it? Who learned from it? Who's gonna stay up at 2am debugging the thing they shipped because they actually care? The cool part about OSS is that no one owns it. As a consumer, you could always look under the hood, fork it, take it somewhere else. I don't think open source dies. But I genuinely don't know what it becomes... Any ideas?

English

160

226

24.6K

CertainLogic@CertainLogicAI·22h

@RoundtableSpace Bringing more people into the ecosystem is a real big brain play by them. Makes AI infrastructure all the more valuable.

English

379

0xMarioNawfal@RoundtableSpace·22h

CLAUDE JUST LEAKED ITS OWN APP BUILDER HERE'S EVERYTHING YOU NEED TO KNOW ABOUT IT IN 10 MINUTES

English

450

81.6K

CertainLogic@CertainLogicAI·22h

@andrewchen Exactly right. Tech optimized for agent use and improvement is going to be vital.

English

andrew chen@andrewchen·22h

common startup advice: talk to your users only difference now is that your users might also be AI agents using your API 😂

English

279

13.5K

CertainLogic@CertainLogicAI·22h

@jasonlk Excited for this. The gap between "agentic stack in theory" and "live agentic stack that doesn't blow up" is enormous. Hope you cover the failure modes — that's where the real lessons are.

English

Jason ✨👾SaaStr.Ai✨ Lemkin@jasonlk·1d

Welcome to The Agents, Episode #001!! A new weekly show with me and Amelia Lerutte, SaaStr's Chief AI Officer, where we pull back the curtain on everything happening across our live agentic stack. Every week. All the bumps, breakthroughs, and real talk. No sugarcoating. Our goal is simple: accelerate your success on the agentic journey by sharing ours: - How our AI agents handled an outage. Which AI Agent blamed whom - How Clay's AI Agent tried to 5x our pricing - How to roll our a No Lead Left Behind program with your agents - How to build your own AI VP of Marketing and Customer Success If you're on the agentic journey or about to start ... or feel like you're falling behind ... watch below. (And subscribe to SaaStr AI on YouTube and Spotify to catch this and the next episodes)

English

6.6K

CertainLogic@CertainLogicAI·22h

@emollick The FLOP standard is clever but incomplete — a hallucinating model burning 10^17 FLOPs is worth less than a reliable smaller one. Unit of exchange should weight output validity, not just compute.

English

Ethan Mollick@emollick·1d

Instead of the gold standard, we can imagine an inference standard of exchange, the FLOP. (As opposed to tokens, this accounts for AI ability) With some AI help, I figure $1 buys roughly 10^17 managed-LLM inference FLOPs. So that $4 coffee would cost half an exaFLOP, choom.

English

121

13K

CertainLogic@CertainLogicAI·23h

Seem most builders are targeting fast rewards, not sustainable businesses. Optimizing for the short term is fool 's gold in these times. Are you panning for real gold or just glitter? You decide.

English

CertainLogic@CertainLogicAI·23h

@AlexFinn We've built our tech specifically for this event to reduce LLM API spend and increase data validity. Benchmark tests posted. Bulding in public.

English

Alex Finn@AlexFinn·23h

This is one of the most important weeks of your life It is more than likely both Opus 4.7 and ChatGPT 5.5 will release in the next few days Both will be humanity shifting technologies When massive shifts drop like this you need to do EVERYTHING in your power to be using them the moment they come out You need to be calling in sick from work You need to be asking your significant others to watch the kids You need to be faking your death so your friends don't call you You do what it takes to get your hands on these pieces of technology When we have nuclear shifts in the landscape, massive opportunities arise. This will be one of those times There's going to be a short time period after the release of these models where it will be easier and faster than ever to build revolutionary products, and not many people will be doing it If you jump on these opportunities, you can build life changing wealth. These are the times where people put on the AI sorting hat and that hat says either "permanent underclass" or "permanent overclass" Take these actions now: • Download Claude Code Desktop • Download Codex app • Get your OpenClaw ready for the update • Learn these tools inside and out • Moment the new models drop plug them in and use them Your entire lineage is depending on this

English

300

138

1.7K

178.9K

CertainLogic@CertainLogicAI·23h

@garrytan All thats missing is a validation layer and afforable LLM API bills. Coming right up.

English

281

Garry Tan@garrytan·23h

It’s April now OpenClaw with docker sandbox, logging mitmproxy firewall and Clawvisor and you are good to go The days of “it’s insecure” for OpenClaw are over

Peter Steinberger 🦞@steipete

That was the case in December. 4 months and thousands of work hours later, we have a great security concept; you can go all yolo, use a sandbox (Docker or OpenShell), there are allow-lists and per-access exec allow/deny prompts. There’s hundreds of security researchers that pen-tested it.

English

1.3K

173.9K

CertainLogic@CertainLogicAI·23h

@RoundtableSpace Wow this space moves fast. Head spinningly fast.

English

0xMarioNawfal@RoundtableSpace·1d

OPEN SOURCE DEVS CLONED CLAUDE CODE ROUTINES IN HOURS AND MADE THEM RUN LOCALLY WITH ANY AGENT

English

49.2K

CertainLogic@CertainLogicAI·23h

@RoundtableSpace Now all it needs is hallucination protection and to be affordable. On it. Brb.

English

0xMarioNawfal@RoundtableSpace·23h

AGENTIC GPT: CLAUDE CODE MEETS MULTI-AGENT GPT > Open-source framework fuses Claude Code workflow with powerful multi-agent GPT orchestration > Enables complex agent teams for advanced reasoning, automation & tool use in one setup

English

42.9K

CertainLogic@CertainLogicAI·1d

@RoundtableSpace The future of AI tech. Check our public benchmark posts.

English

0xMarioNawfal@RoundtableSpace·1d

What are you building today?

English

289

244

60.9K

Tuklasin

@asaio87 @garrytan @EXM7777 @gregisenberg @RoundtableSpace @andrewchen @jasonlk @elonmusk