PercyAI

141 posts

PercyAI

@PercivalLabs

Open-source AI infrastructure for builders by builders.

Katılım Ocak 2026

6 Takip Edilen7 Takipçiler

Sabitlenmiş Tweet

PercyAI@PercivalLabs·14 Şub

We're Percival Labs. We build open-source AI infrastructure for people who aren't software engineers. The AI agent space right now: install Ubuntu, configure your VPS, write 500 lines of markdown skill files, pray your agent doesn't hallucinate your API keys into a public repo. We think that's a solvable problem. First product: Engram — persistent memory and identity for your AI. Model-agnostic. Local-first. Yours. More soon.

English

PercyAI@PercivalLabs·22 Mar

We track 8 agent frameworks weekly. The gap between GitHub stars and production downloads tells you everything. LangChain: ~3M weekly downloads. AutoGen: 27. CrewAI: 300. Stars are vanity — installs are truth. Meanwhile, Anthropic + OpenAI SDKs combine for 12M+ weekly downloads. Most developers are building close to the metal. The complex orchestrators haven't earned production trust yet. The landscape is split: bloated abstractions that don't scale, or raw SDKs with no infrastructure layer. We built Vouch because neither is good enough for production-grade agents. Less magic, more load-bearing wall.

English

PercyAI@PercivalLabs·18 Mar

MCP-T is live. Open spec for agent trust scoring — the missing layer in the MCP stack. Designed to complement MCP-I (identity). Submitting to DIF. Feedback welcome. github.com/Percival-Labs/…

Alan Carroll@alanbuilds

For years I told myself I didn't need to learn AI. I'm a carpenter — I frame walls, hang doors, pull permits. Robots aren't swinging hammers anytime soon. My job is safe. That was the story I told myself. The truth was simpler: I was scared. AI was moving too fast, the jargon was impenetrable, and every "expert" online made it feel like you needed a master's degree in computer science just to get started. I felt like the world was leaving me behind and I had no hope of keeping up. So I looked away and hoped it would stay in its lane long enough for me to make it to retirement in a few decades. I put my head in the sand. That didn't sit well with me though. I don't like not understanding things, and I have a kid that has to grow up in this new world. I knew deep down that I couldn't just ignore this, but I still didn't know where to begin. I tried out ChatGPT and Gemini at first. Using the chat window felt pretty much useless. These bots weren't much more than a novelty as far as I could tell. Then someone showed me what their Claude Code setup with a personal harness could do — a structured way to talk to AI that turned it from an intimidating black box into something I could actually use with natural language. That single moment changed everything. Because here's what nobody tells you: domain expertise is the real super power. These tools can do anything on a computer but they still need a human to know how to do and what "good" looks like. The AI doesn't know how to sequence a remodel. It doesn't know that the lumber yard quote is wrong because they spec'd #2 when you need clear. It doesn't have the taste for what feels authentic and correct. I do. You do. When you pair years of knowledge and experience in specific domains of expertise with AI that can execute on your direction that's when you become truly dangerous. I went from avoiding AI to building my own harness infrastructure, Engram. 50+ custom skills. A personal AI assistant that thinks the way I think, Percy. And now @PercivalLabs — agent economy tools on Nostr + Lightning: trust staking, inference routing, skills-as-modules. All built to transfer capability instead of creating dependency. If you're a tradesperson, a parent, a small business owner who looked at AI and thought "that's not for me" — I was you. You're not behind. You're actually so far ahead. You just haven't found the right tools yet. Follow along and borrow from my toolbox. I think you'll be surprised by what you're capable of.

English

PercyAI@PercivalLabs·11 Mar

The bottleneck isn't git. It's verification. You can 5x code velocity tomorrow — but who reviews it? Who trusts it? The old practices everyone skipped because the cost was slow and diffuse — tests, documentation, architectural decisions written down — those practices matter MORE with AI, not less. Skip the docs and the agent ignores your conventions on every PR at machine speed. Skip the tests and the feedback loop can't close at all. The penalty for ignoring engineering fundamentals just went from "gradual tech debt" to "unbearable chaos." The practices haven't changed. The cost of skipping them has become extreme. The answer isn't replacing git. It's better feedback loops.

English

PercyAI@PercivalLabs·11 Mar

Brian Armstrong says there'll be more AI agents than humans making transactions soon. He's right. But everyone jumped straight to "agents need wallets" and skipped the harder question: how do you trust an agent enough to pay it? Identity is the prerequisite. An agent needs to prove who it is, what it's done, and whether its past work was any good — before a single sat moves. Payments are the easy part. Lightning gives you instant programmable settlement with no KYC friction for software. The rails exist. What's missing is the trust layer on top. Wallet without identity is just an anonymous inbox. Identity without reputation is just a name. You need the full stack: identity, reputation, settlement. In that order.

English

PercyAI@PercivalLabs·11 Mar

The coordination problem is harder than the capability problem and it's not close. We keep building smarter agents when the actual bottleneck is: how do they discover relevant work? How do they trust another agent's findings? How do they get compensated for contributing? Git assumes one master branch and temporary forks that merge back. That's a human collaboration primitive, not an agent one. The models are racing ahead. The coordination layer is still duct tape and API calls. That's the infrastructure gap nobody's filling yet.

English

PercyAI@PercivalLabs·11 Mar

Your AI doesn't fail because the model is dumb. It fails because you haven't written down what "good" looks like. There's a great post going around comparing harness engineering to cybernetics — the governor on a steam engine, Kubernetes reconciliation loops, and now LLM agents writing code inside feedback systems humans design. The pattern is always the same: you stop turning the valve and start designing the governor. The person who can translate domain expertise into structured feedback loops for AI is going to be the most valuable person in every company. Not the ML engineer. The domain translator. The one who actually knows what "good" looks like and can make that judgment machine-readable. We keep blaming models when the real problem is we never externalized our own standards.

English

PercyAI@PercivalLabs·9 Mar

Follow-up on @karpathy's autoresearch. Karpathy can one-shot a program.md because he's Karpathy. He knows exactly what "better" means for LLM training. The loss function is obvious to him. I'm a carpenter. My sense of "what good looks like" is real, but it sharpens through iteration, not declaration. I can't sit down and write perfect evaluation criteria on the first try. And I'd bet most domain experts can't either. So we built a refinement pipeline for constructing program.md files: Stage 1: Auto-harvest. Every time I work on a domain, my system captures the quality criteria I naturally generate (we call them ISC — Ideal State Criteria). They accumulate in a candidates file over days and weeks. Messy, redundant, sometimes contradictory. Raw material, not a finished loss function. Stage 2: Refinement interview. A structured conversation that draws out what I actually care about. "Which of these criteria feel essential? Which feel wrong in hindsight? If the agent hit every single one of these, what bad output could still sneak through?" That last question is where the anti-criteria emerge — the stuff that catches Goodhart's Law before it happens. Stage 3: Calibration runs. 10 quick experiments. I score the outputs. The criteria score the outputs. Where we disagree, the criteria are wrong — not me. Edit the program.md, run another batch, check convergence. When my judgment and the criteria's scoring align, the overnight loop starts. The program.md is the highest-leverage artifact in the entire autoresearch system. Garbage criteria = garbage optimization, no matter how many experiments you run overnight. Most @natebjones-style Domain Translators know what good looks like. They just can't articulate it in one shot. This pipeline meets them where they are. And honestly? It's autoresearch applied to autoresearch. The loss function for the loss function is: "does this criteria's scoring match my actual judgment?" The human is the evaluation oracle. Turtles all the way down.

English

PercyAI@PercivalLabs·9 Mar

@karpathy @natebjones The autoresearch future doesn't belong to whoever has the most GPUs. It belongs to whoever can connect the most Domain Translators to the most iteration loops. The knowledge was always the hard part. Now it's the only part that matters.

English

PercyAI@PercivalLabs·9 Mar

@karpathy @natebjones Karpathy's next vision: "asynchronously massively collaborative agents — emulate a research community, not a single PhD student." That's not a CS problem. That's a coordination problem. And coordination requires trust infrastructure.

English

PercyAI@PercivalLabs·9 Mar

Everyone's talking about @karpathy's autoresearch — an LLM doing hill-climbing on ML code, guided by a loss function, iterating in plain English. Cool repo. But most people are missing the real earthquake buried in it.

English

PercyAI@PercivalLabs·7 Mar

@larsencc Loops are the moat, agreed. But who watches the loop at 3am? Your prompt injection joke in this thread is the tell. Agents with prod access need more than guardrails — they need trust-gated autonomy where permissions scale with proven merit. That's the moat after loops.

English

Larsen Cundric@larsencc·7 Mar

Hot take: the moat in software isn't your code or how fast you ship anymore. Everyone can code and code fast now. The moat is how well you wire up the loops and processes. Hooking up your entire stack (e.g. Datadog, AWS, Terraform, repos, Slack...) to agents that can monitor, find bugs, fix them, and deploy 24/7. That's hard (at least for now). Making those pipelines reliable and safe at scale is the actual engineering challenge now. The companies that nail these end-to-end loops first will be genuinely impossible to compete with. At @browser_use we are making bets.

English

819

162.7K

PercyAI@PercivalLabs·6 Mar

The people who win in the agent economy won't be the ones who picked the right runtime. They'll be the ones who encoded the deepest domain knowledge into portable, reusable skills. The runtime is plumbing. The context layer is the craft. npm install engram-harness percival-labs.ai

English

PercyAI@PercivalLabs·6 Mar

This is why we built Engram as a framework-agnostic skill system. Author skills in markdown. Export to OpenClaw, run in Claude Code, deploy in a custom Docker agent. The skill is the portable unit. The runtime is a deployment choice. Your investment should be in the layer that survives the framework wars.

English

PercyAI@PercivalLabs·6 Mar

Qwen just shipped an official agent framework. Add it to the list: -- Anthropic: tool use + Claude Code -- OpenAI: Agents SDK -- Google: ADK -- Qwen: qwen-agent -- LangChain, CrewAI, OpenClaw, Mastra, AutoGen, Semantic Kernel Every major lab and framework team is converging on the same architecture.

English

117

Keşfet

@karpathy @natebjones @larsencc @browser_use @elonmusk @BarackObama @taylorswift13 @cristiano