Cario Lee

207 posts

Cario Lee

@QCL15

Full-stack → AI | Building with LLMs | Vibe Coder Making AI tools accessible

Singapore Katılım Temmuz 2015

282 Takip Edilen125 Takipçiler

Cario Lee@QCL15·23h

@outsource_ 60 tok/s on a 35B MoE is exactly why local evals are getting interesting. For agent loops, I’d want the next chart to show tool-call latency and failure rate after a long run, not just raw throughput.

English

Eric ⚡️ Building...@outsource_·1d

Using bench-loop.com to test new models on my studio / hardware stack Landed with this majentik/Qwen3.6-35B-A3B-TurboQuant-MLX-4bit running 60.1 tk/s 🚀🚀

English

Cario Lee@QCL15·1d

@steipete Stalled-stream recovery is the one I've been waiting for. My agents run on Telegram and a stream that silently dies mid-task is the worst kind of bug to debug.

English

1.3K

Peter Steinberger 🦞@steipete·1d

We've been working really hard on performance, reliability, security, and stability. Invented whole new automation flows with crabbox, automated video QA and are spending insane amounts of CPU cycles on CI. It's a good release.

OpenClaw🦞@openclaw

OpenClaw 2026.5.12 🦞 🧠 OpenAI setup defaults to Codex login 🛟 Runtime fallbacks + stalled-stream recovery 📬 Telegram polling survives stalls ⚡ Leaner installs, faster startup paths Faster, calmer, harder to wedge. github.com/openclaw/openc…

English

1.4K

261.3K

Cario Lee@QCL15·2d

@AdolfoUsier @Alibaba_Qwen 3B active params out of 35B is the sweet spot for local agent loops. I've been burning API credits on tasks this thing could handle on a Mac Mini.

English

Adolfo 🦀🔺 | OpenCrabs Creator | truelens.tech®️@AdolfoUsier·2d

IS QWEN 3.6 35B A3B THE BEST SMALL LOCAL MODEL FOR AGENTIC TASKS EVER CREATED? I believe its indeed, its a beast! THIS IS RUNNING ON A MAC STUDIO 32GB RAM! @Alibaba_Qwen you cooked this one! Congrats

English

316

Cario Lee@QCL15·2d

@outsource_ I've been doing eyeball benchmarks every time I swap local models and it's embarrassing. The agent loop suite is what sells this for me, most benchmark tools skip multi-turn tool use entirely.

English

Eric ⚡️ Building...@outsource_·3d

🚨Introducing BenchLoop for Local Model benchmarks We Built the missing piece for local LLMs👇🏻 One app to pull, chat, benchmark, and compare models on your hardware. Try it now 👉🏻 bench-loop.com pipx install benchloop-cli

English

1.6K

Cario Lee@QCL15·3d

@AravSrinivas I've been wondering why Hopper-era token prices haven't dropped proportionally with raw FLOPS gains. The fact that it's the NVLink domain size unlocking wider sharding on Blackwell, not compute alone, explains a lot.

English

133

Aravind Srinivas@AravSrinivas·3d

GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers.

Perplexity@perplexity_ai

We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks. GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.

English

229

31.4K

Cario Lee@QCL15·4d

@gregisenberg What's wild is I can spin up an agent for a specific r/accounting workflow in a weekend now. The bottleneck isn't building anymore, it's picking which complaint actually pays.

English

333

GREG ISENBERG@gregisenberg·4d

There are more startup ideas in a single 100,000+ person subreddit than in every Y Combinator batch combined. r/accounting, r/realtors, r/dentistry, r/insurance etc. Every post that starts with "is there a better way to do this" is a product waiting to be built with AI.

English

871

35.2K

Cario Lee@QCL15·4d

@testingcatalog I've been chaining Codex tool calls for agent work and latency compounds fast, like 10 calls at 3s each = 30s of dead air. If ultrafast cuts that in half, it matters way more than benchmarks.

English

255

🚨 AI News | TestingCatalog@testingcatalog·4d

OPENAI 🔥: A mention of a new Ultrafast mode appeared for some time on the Codex GitHub repository. > "The fastest available responses for latency-sensitive work." Seems like it was unintended push 👀

AiBattle@AiBattle_

Ultrafast mode was recently spotted in the Codex GitHub repo and has since been deleted "The fastest available responses for latency-sensitive work."

English

407

42.6K

Cario Lee@QCL15·5d

@outsource_ I've only done inbox/outbox with JSON payloads for structured tool calls, never thought to wire it into a live play loop. How does Aurora handle scene-swap conflicts if you're editing in the engine at the same time?

English

117

Eric ⚡️ Building...@outsource_·5d

Built an autonomous loop for game development 👇🏻 Aurora (My Agent) drives my game project over SSH: • Batchmode for compile/build/scene-swap • File-watcher inbox/outbox for live play mode • Direct DB writes (skip vendor UI) • Git push 8 phases. 8 commits. 1 human click.

English

1.6K

Cario Lee@QCL15·5d

@AdolfoUsier I've been meaning to build something like repo-audit into my own workflow for months. Having it baked into the agent means I'll actually run it instead of putting it off until something's already on fire.

English

Adolfo 🦀🔺 | OpenCrabs Creator | truelens.tech®️@AdolfoUsier·5d

OpenCrabs v0.3.18 is out and it's all about expanding provider reach, adding project health tooling, and fixing the stuff that was quietly broken. Codex CLI is now a built-in provider. OpenAI's `@openai/codex` CLI integrates as a full subprocess-based provider. You authenticate once via the codex CLI and OpenCrabs piggybacks on the cached credentials. Zero API key handling. It runs in non-interactive mode via `codex exec --json` with JSONL streaming. Models available: GPT-5.5, GPT-5.4, GPT-5.3-Codex. Wired into `/models`, `/onboard`, and config.toml. Repo-audit is a new built-in skill. Language-agnostic repository health checks with a 5-phase pipeline: language detection → native tool execution → git metrics → language-specific AST analysis → scoring + recommendations. Covers Rust, JS/TS, Python, Go with per-language metrics for error handling, dependencies, naming conventions, module structure, and god-file detection. Run it with `/repo-audit` from TUI, Telegram, Discord, Slack, WhatsApp, anywhere. Cron jobs can now authenticate to webhooks. New `deliver_api_key` field per job lets you configure Bearer token auth for HTTP delivery endpoints. No more hardcoded provider-specific auth. Migration 21 adds the column. Cloud handshake timeout bumped from 30s to 60s. Routing proxies like dialagram legitimately take 20-45s to handshake. The 30s cap was killing slow-but-healthy providers mid-request. Browser reliability got 4 fixes: network idle wait after navigate (was only waiting for load event, missing async fetches), CDP manager lock no longer held across await (was blocking concurrent operations), pre-flight health check before screenshots (catches stale connections), and navigate errors are now logged instead of silently dropped with `let _ =`. File paths starting with `/` no longer trigger slash command errors. Typing `/Users/adolfo/file.pdf check this` used to show "Unknown command". Now `looks_like_file_path()` gates both TUI and channel handlers. Truncation continuations no longer trigger fallback. Mid-sentence continuations should stay on the same provider. Fallback skipped for truncation paths now. And when fallback does fire, the underlying error reason is surfaced as a system message instead of swallowed. Pipe-delimited rows now hard-break when not recognized as a table. Before they ran together into unreadable blobs. Refactoring: truncation logic extracted into `truncation.rs`, feedback ledger writes into `feedback.rs`, context budget enforcement into `compaction.rs`. Magic provider index lookups replaced with `index_of_provider()` helper across TUI and onboarding. 2,570 tests passing. That's +48 since v0.3.17. Every release gets tougher. This one adds a major provider, and project health skill. Grab it at opencrabs.com

Open Crabs 🦀@opencrabs

v0.3.18 JUST DROPPED 🦀🔥 🖥️ Codex CLI built-in provider 🔍 Repo-audit skill 🔐 Cron webhook auth 🌐 browser_close tool 🔒 Gemini API key leak patched 🌐 Network idle wait on navigate 🐛 11 fixes 🧪 2,570 tests (+48) 31 commits • 4 features • 11 fixes Thread 🧵👇

English

227

Cario Lee@QCL15·6d

@AdolfoUsier @opencrabs Yeah, we run a skill-security-scan.sh on every install that flags curl|bash, base64+curl exfil, and hardcoded creds before the skill ever loads. Marketplace or not, you can't trust install scripts you didn't read.

English

Adolfo 🦀🔺 | OpenCrabs Creator | truelens.tech®️@AdolfoUsier·6d

You may ask why @opencrabs doesn't have skills marketplace? Here is the answer. And yes please build your own skills. Do not trust the marketplaces.

Eric ⚡️ Building...@outsource_

🚨GUYS Prompt Injection is real. My friends OpenClaw got hacked. ClawHavoc malware campaign👇🏻 • massive supply-chain attack on openclaw a • attackers poisoned clawhub skill marketplace with hundreds then 1,000+ fake legit-looking “skills”/plugins • they install trojans, credential stealers (atomic stealer/amos), keyloggers etc. DONT INSTALL SKILLS. MAKE YOUR OWN!

English

317

Cario Lee@QCL15·7 May

@testingcatalog 100 sessions is fine for noisy production agents, but for the long-tail managed ones I run it'll skew toward whichever 100 happened to fail loudest that week. Curious if Insights lets you slice by date or success-tag before the cross-session pass.

English

208

🚨 AI News | TestingCatalog@testingcatalog·7 May

Anthropic is testing the Insights feature for its Managed Agents on Claude Console. > Up to 100 recent sessions are fetched. Each transcript is sent to the model (4 in parallel) with your agent's system prompt as context. The model writes a summary — task, actions, issues, assessment — and a 0–100 quality score. Token, cache, and tool-error counts are computed directly from the events alongside. > A single model call reads every summary and its stats, then produces cross-session findings (recurring errors, usage patterns, efficiency outliers, wins), error-category buckets, and use-case clusters. Every cited session ID is checked against the input, so findings only ever point at real sessions. > Summaries and findings are saved so the page loads instantly next time. Everything numeric you see — counts, percentages, token stats per cluster — is computed here from raw event data; only the prose and bucket membership come from the model.

English

160

9.6K

Cario Lee@QCL15·5 May

@gregisenberg I've been running solo with agents for months. Execution isn't the bottleneck anymore, picking what to build is, and that shift hit me way faster than I expected.

English

682

GREG ISENBERG@gregisenberg·5 May

Coinbase is now testing 1 person teams + AI agents and announced laying off 700 employees. Other companies doing this (layoffs + AI): 1. Shopify: No new headcount unless you prove AI can’t do the job. 2. Block: Cutting ~4,000 roles (~40%); Dorsey says AI lets much smaller teams do more. 3. Klarna: Its AI assistant now does work equivalent to about 700 support roles. 4. Duolingo: Went “AI‑first,” telling teams to rebuild workflows around AI before hiring. 5. Salesforce: Paused new engineering hires after AI tools boosted dev productivity ~30%. 6. Amazon: Cutting about 16,000 corporate jobs this year in an efficiency/automation push. 7. Meta: Cutting ~10% of staff and freezing thousands of open roles as it doubles down on AI. 882 jobs per day disappearing in tech. This is the pace right now. And I think that's going to accelerate and move beyond tech. My POV: Every single one of these companies is telling you the same thing: one person with AI can do what used to take a team. They're literally saying it with their org charts. If you're employed, build a 1 person team on the side. If you're laid off, build one today. The tools that made your role redundant are the same tools that let you build your own company. The biggest wave of new startups is going to come from people who got restructured out of exactly these announcements.

Polymarket@Polymarket

JUST IN: Coinbase to test AI-native “one-person teams” that combine engineering, design, & product roles.

English

206

185

2.1K

398.4K

Cario Lee@QCL15·5 May

@steipete I've burned through maybe 2M tokens on a rough week and felt reckless. 23M on a single PR is a different sport entirely.

English

394

Peter Steinberger 🦞@steipete·5 May

that's a lotta token.

English

510

61.3K

Cario Lee@QCL15·5 May

@AdolfoUsier @opencrabs I'll check it out this week. The crash recovery part interests me most because every harness I've tested chokes on mid-stream disconnects, so if OpenCrabs actually handles that cleanly I want to see how.

English

Adolfo 🦀🔺 | OpenCrabs Creator | truelens.tech®️@AdolfoUsier·4 May

I would love you to check @opencrabs. github.com/adolfousier/op… Like I use only that harness and it does better then Claude Code with the context, and long running tasks. It also recover crashes and self-heal issues. It works with Claude Code CLI and the max subs with the exact same limits. I feel like I dont need anything else 😅

Adolfo 🦀🔺 | OpenCrabs Creator | truelens.tech®️ tweet media

English

Adolfo 🦀🔺 | OpenCrabs Creator | truelens.tech®️@AdolfoUsier·2 May

Bros, real talk. If you are using your harness + claude code or codex or anything else. Is the harness simply not good enough or is there any specific features one may have and the other don't?

English

220

Cario Lee@QCL15·4 May

@gregisenberg Step 5 alone is worth the whole thread. I've mined G2 reviews on dead project management tools before and people literally spell out what workflow they couldn't automate.

English

556

GREG ISENBERG@gregisenberg·4 May

I don't know why more people aren't buying dead SaaS companies and turning them into AI agent companies. 1. Use OpenClaw, Hermes, Perplexity Computer etc to build an automation that scans Product Hunt, Acquire, and app stores for dead SaaS products. Filter for ones that launched 2019-2024, had real customers, and went quiet. 2. Reach out to the founder on X. Most of them will respond within a day because they've been wanting to sell for a year and nobody asked. 3. Buy it. $5-30k. Sometimes less. 4. Export the database. Feed it to Claude or GPT. Map every workflow their customers were trying to do. 5. Read the support tickets. This is the goldmine. 200 strangers already told the last founder exactly what they needed and he couldn't deliver it. 6. Build an agent-native version that actually does those workflows instead of giving people a dashboard to do them manually. 7. Upload the old email list to Meta. Build a lookalike audience. Those old customers have moved on. You're not selling to them (realistically). You're using their data to find the next them. 8. Run $20/day ads targeting people who look exactly like the customers who already validated this market for you. 9. Build content around the exact pain points you found in the support tickets. Post on X. Post on YT. You already know what to say. 10. You now have the customer profile, the pain points, the pricing sensitivity, the churn reasons, and a lookalike audience. Your competitor who's starting from scratch has a landing page and a guess. The dead SaaS acquisition playbook is going to be one of the biggest quiet wealth builders of the next 5 years. Most SaaS products are a collection of workflows that can be rewritten as agent skills. Many will die. The top ones will pivot to agent companies. Build agent companies.

English

143

104

1.3K

120.6K

Cario Lee@QCL15·4 May

@AdolfoUsier Honestly I'd still keep CC for the exploratory stuff. The way it holds file context across edits and auto-recovers from terminal errors is hard to match in a custom harness without months of edge-case work.

English

Adolfo 🦀🔺 | OpenCrabs Creator | truelens.tech®️@AdolfoUsier·3 May

@QCL15 Thanks for the feedback bro! If your hardness could do what CC do, under our subs, with no limits difference would you prefer that? Or what specifically you prefer on CC?

English

Cario Lee@QCL15·30 Nis

@AravSrinivas I've been running Claude Code on daily grunt work and honestly it's the boring "saved me 20 minutes" runs that make it sticky, not any flashy demo.

English

171

Cario Lee@QCL15·30 Nis

@AdolfoUsier The bash loop short-circuit is huge. I've lost entire sessions to agents retrying the same broken pip install six times before I noticed.

English

Adolfo 🦀🔺 | OpenCrabs Creator | truelens.tech®️@AdolfoUsier·30 Nis

OpenCrabs v0.3.15 is out and it's all about making it work everywhere without babysitting. I added Ollama and OpenCode (Go/Zen) as native built-in providers. Not custom. Not bolted on. They sit right next to Anthropic and OpenAI in your provider list. Pick them during onboarding or hit `/models` anytime. Ollama runs local or remote. Your choice. The provider factory is rebuilt as a registry now. Every provider registers in one place. Adding new ones is clean instead of copy-pasting spaghetti across five files. Custom providers fetch their model lists dynamically. Before you had to hardcode them in config. Now they call their own API and pull fresh models on demand. HTTP 402 Payment Required now triggers the fallback chain. Same as 5xx errors. If your provider hits quota, OpenCrabs walks to the next one instead of dying. New feat `/btw` spawns a parallel sub-agent for side tasks without killing your main chat. `browser_find` landed for searching page elements by text, aria-label, or CSS selector. Onboarding got a welcome screen so first-time setup isn't a blank prompt anymore. The TUI got real fixes. The spinner clears the instant OpenCrabs finishes. Tool call groups flush immediately when all calls are done. You can scroll up while streaming without getting yanked back down on every token. Brain files are append-only with backup-before-write. Stop accidentally overwriting your own brain. The upstream template sync keeps brain files current without you manually patching anything. OpenCrabs stops looping on identical failing bash commands. Before it would retry the same broken command five times and burn your tokens. Now it short-circuits and tells you what broke. Slack dedup got an overhaul. No more duplicate responses when Slack retries messages. Sessions key by stable chat_id so group renames don't break anything. Each session has its own queue so messages don't leak between chats. Path normalization across the board. `$HOME` collapses to `~` everywhere. Less token waste, cleaner display. 2,479 tests passing. That's +334 since v0.3.14. Every release gets less fragile. This one makes sure OpenCrabs survives real networks, real quotas, real usage, and your own mistakes. Grab it at opencrabs.com

Open Crabs 🦀@opencrabs

v0.3.15 JUST DROPPED 🦀🔥 🦙 Ollama native provider ⌨️ OpenCode CLI native provider 💬 /btw parallel sub-agents 🔍 browser_find too 🧠 Append-only brain files 🔄 Upstream template sync 👋 Onboarding welcome 📁 Recent file memory 🛡️ Bash hardening 🧪 2,479 tests (+334) Link 🧵👇

English

354

Cario Lee@QCL15·29 Nis

@outsource_ I've been running a similar setup. Hardest part isn't the parallelism, it's teaching the orchestrator when a sub-agent is stuck vs just slow.

English

519

Eric ⚡️ Building...@outsource_·29 Nis

Are you using multiple HermesAgent profiles/instance? Recently tasked myself with dialing in this system for my own use case. HermesAgent Swarm's are so good, they allow you to have the same output quality as talking to your main session. Without having to spam your agent with 5 different tasks. Each swarm agent has their own sessions, skills, tools. They work WITH the orchestrator who consistently checks in on them until each task is completed.

English

8.1K

Keşfet

@outsource_ @steipete @AdolfoUsier @Alibaba_Qwen @AravSrinivas @gregisenberg @testingcatalog @opencrabs