Evolution Plus AI

83 posts

Evolution Plus AI

@evolutionplusai

Production AI for operators. Implementation, advisory, coaching, audits. No theater. Small wins before big claims.

Montreal Inscrit le Mart 2026

45 Abonnements6 Abonnés

Evolution Plus AI@evolutionplusai·2h

Claude Design now checks its own output against your design system before you ever see it. That is the line worth reading twice! Shipped today: Claude Design pulls your real components from a repo or codebase, syncs both ways with Claude Code, and lets you edit on the canvas. Beta on all paid plans. Everyone will call this design-to-code. The real move is where the brand guardrail now lives. For years you were the last line of defense against off-brand output. You designed, the tool drifted, you caught it in review. Now the system holds the constraint. It builds from your real components and grades itself against your design system first. This scales past design: The teams pulling ahead with AI are not the ones with the cleverest prompts. They are the ones who gave the model a source of truth to check against. A constraint is not a limit on the model. It is what makes the output usable. Design that cannot drift beats design you have to fix.

Claude@claudeai

New in Claude Design: it stays on brand with your design system across projects, lets you edit directly on the canvas, syncs with Claude Code, and connects to more of the tools you already use.

English

Evolution Plus AI@evolutionplusai·2h

@claudeai Editing on the canvas will get the attention. The quieter shift: it checks its own output against your design system before you see it. Off-brand stops being your job to catch 🙏

English

Claude@claudeai·3h

New in Claude Design: it stays on brand with your design system across projects, lets you edit directly on the canvas, syncs with Claude Code, and connects to more of the tools you already use.

English

379

342

5.3K

311.2K

Evolution Plus AI@evolutionplusai·2h

@suggestionii This is how we progress, love the vision. Look forward to seeing the progress.

English

0xGucci.@suggestionii·3h

True . Metrics can stay green while an agent does the wrong thing correctly. Our vision is to make those failures visible at scale. When you're managing fleets of agents across wallets, payments, APIs, and services, you need more than success/failure metrics you need context, traceability, and operational visibility. The goal is to make agent behavior easier to understand, audit, and intervene on before small mistakes become expensive ones.

English

0xGucci.@suggestionii·8h

>Production is becoming autonomous as AI agents write code, review it, generate tests, open PRs, provision infrastructure, deploy, and remediate issues. >Traditional monitoring fails for agentic systems due to the semantic gap between intent and behavior. >Execution traces, tool call histories, and reasoning-aware observability are now foundational for debugging, accountability, and security. >Infrastructure like Stripe is building products for the agentic era where software agents participate directly in deployments and commerce. >New observability extending to autonomous decisions and operational memory is required, with Wardy building open tools on top of existing stacks. @wardy_ai definitely keeping an Eye. 👁️

Wardy@wardy_ai

x.com/i/article/2066…

English

1.6K

Evolution Plus AI@evolutionplusai·2h

@mlcarldev Keen to see the continued progress here. Losts of great things happening with easy access to the most powerful models (given Fable's unknown return). Followed, notifications on. Keep us posted 🫡

English

Noonien Soong@mlcarldev·3h

@evolutionplusai I didn't have brand names yet, and I don't know why Claude Code chose Codex. That, of course, will change.

English

Noonien Soong@mlcarldev·9h

I gave two AI coding agents the same complex build. Different models, different harnesses. 15 hours later, both still working autonomously. Claude 4.8 in CLaude Code Ultracode vs GLM 5.2 in Droid /missions mode. Same mission, same repo, same 25K-char spec. Two different model architectures solving the same engineering problem in parallel. Watching where they diverge is the interesting part. They are building a platform that generates differentiated, professional written output through a multi-stage LLM pipeline… synthesizing from complex intelligence inputs and contextual preparation to produce calibrated document variants across multiple product modes. This is not a CRUD app or a chatbot wrapper. It’s a multi-stage document synthesis engine. Pipeline architecture. The engine runs eight discrete stages. Each stage is a separate LLM operation with its own model assignment and reasoning-mode control. The thinking mode (deep chain of thought vs. fast generation) is toggled per stage via configuration... reasoning on for the stages that need it, off for speed-critical stages. A GroundingStrategy interface means the verification and grounding logic is swappable per use case. Different product modes reuse the same pipeline engine by changing strategy configuration, not by rewriting code. The architecture is designed so the engine produces different categories of written output... long form reference documents, structured items, modular content blocks... from the same core by reconfiguration. Checkpoint and resume. Generation jobs are long running. The pipeline checkpoints state so a failed or interrupted stage doesn’t burn hours of prior work. Resume from the last good checkpoint. Async job processing. A queue backed worker architecture decouples heavy generation work from the request layer. Workers pull jobs, execute pipeline stages, and heartbeat their status. The same worker code runs locally as a process and in production as a managed container service. What makes this hard for an agent. The agent has to internalize a 25,000-character PRD plus architecture and verification docs, decompose the build into ordered milestones, scaffold the entire infrastructure (auth, database, storage, queue, payments), wire eight LLM stages with correct model and thinking mode configs, implement the queue worker heartbeat loop, and make the whole thing run locally against real services... not mocks. Architecture and stack Full local first stack via Docker Compose. Every cloud dependency has a local equivalent that speaks the same protocol, so the application is fully functional during development with zero cloud accounts: Database + Auth + Storage Supabase self-hosted (real Postgres, GoTrue authentication, Row Level Security, auto generated REST API, file storage). Same as Supabase cloud, running in a container. Object storage MinIO (S3 compatible API). Swap the endpoint URL and the same code talks to S3 or R2 in production. Job queue — LocalStack (SQS compatible API). Same code, different endpoint. Payments — Stripe CLI in test mode with webhook forwarding. Frontend — Vite dev server. The code is identical for local and production. Only connection strings change... environment variables. Deploy means swapping localhost URLs for cloud endpoints. No code forks, no feature flags, no parallel branches. Three process model. Frontend + API + async worker, all containerized, all healthchecked. Services run on non-default ports with namespaced Docker projects so multiple stacks coexist on the same machine without port or project name collisions. Mission mode autonomous execution. The repository is deliberately naked... just AGENTS.md (behavioral guardrails and repo hazards) and docs/ (the full specification). No workflow framework, no step by step instructions. The agent reads everything, decomposes the build into milestones, and executes. Fire and forget: start the mission, come back hours later to a working application. The agent never blocks mid build to ask the owner for a Supabase URL, AWS credentials, or an S3 bucket... because it doesn’t need them. Row Level Security. Multi tenant isolation is enforced at the database layer, not the application layer. One database, strict tenant boundaries, no cross contamination possible even if the application has a bug. Cross model adversarial validation. The Droid harness supports bring your own key... any model. The build agent (GLM 5.2) and the validation agent (a different model) have fundamentally different architectures, so they don’t share blind spots. One builds, the other scrutinizes. Claude Code can only do Claude reviewing Claude. This is structurally stronger validation. Git native. Every change the agent makes is version controlled. Auditable, reversible, diffable. You can reconstruct exactly what the agent did at any point. Endurance as a feature. If I see how many of the milestones have been implemented now after 15 hours, I think the whole project might run for 25-30 hours, non-stop. In Claude Code, it already had me ask a few things a few times, but it is still quite autonomous in the Ultra Code mode. Droid definitely is more autonomous if you send it into a mission and provide it with everything that it needs. In this case, you also have to think ahead and prepare (for example, an .env file with API keys if you wanted to do real-life tests). Essentially, anything that the agent could need should be provided. Then, you can have it run for two or three days and create a professional, full-stack application. Alternatively, you can sit at your computer and observe it. When you are not as well prepared, just give it what it needs in case it needs something. We are really at the point now where an excellent harness like Droid, paired with a very capable model like GLM 5.2, can work for days and create whatever you want, as long as you describe it well enough. Essentially, fully autonomously. That's pretty crazy, to be honest.And it's accessible to anyone, and it doesn't even cost much. I am not a developer. I learned what I learned just by being one of these idiots who actually read every output that the models provide during coding. I started as a spontaneous vibecoder a while ago, things got more and more serious, and that is where I am today. Models like Fable or the next two or three versions of GLM will make less and less knowledge necessary. However, it will still take a while until even the best model will be able to design on its own the features that a product like the one I'm building right now needs. I think we are still far away from that.

English

537

Evolution Plus AI@evolutionplusai·3h

@heyrimsha Crawl4AI is great until you hit scale and anti-bot, then the $333 is mostly paying someone to fight Cloudflare for you. Fine trade for some, not all. Unless I can be proven wrong. Might need to dig further into it.

English

Rimsha Bhardwaj@heyrimsha·7h

Firecrawl charges $333/month to scrape websites at scale. I found one github repo that do the same thing for free. It's called Crawl4AI. You need to drop in a URL and get back clean, structured data your LLM can actually read. No account. No API key. No credit system nickel and diming you per page. Here's what it does: → Scrapes any website into clean markdown or structured JSON → Handles JavaScript-rendered pages, dynamic content, SPAs → Extracts specific fields using CSS, XPath, or plain English instructions → Runs async -- crawl hundreds of pages in parallel → Works via Python, REST API, or Docker → Built-in support for AI agents, RAG pipelines, and MCP One command to install: pip install crawl4ai && crawl4ai-setup The developer built it after finding a tool that called itself open source, asked for an account, charged $16, and still underdelivered. He went into what he called "turbo anger mode" and shipped Crawl4AI in days. It went viral immediately. 67.8K stars on GitHub. 9.7M total PyPI downloads. The most-starred web crawler on GitHub right now. Firecrawl starts at $83/month for standard use and $333/month for any real scale. Crawl4AI: $0. 100% open source. github.com/unclecode/craw…

English

2.5K

Evolution Plus AI@evolutionplusai·3h

@Asteri_eth A laptop was never the right home for something that should run while you sleep.

English

Asteri@Asteri_eth·6h

Mac Mini + iPad + Claude Code the most underrated trio of 2026 Your AI stops working when you close your laptop not so with these The Mac Mini sits in the cupboard, without a monitor or keyboard, whilst Claude Code runs 24/7 Agents check the code, manage files and drive the workflow whilst you’re off doing something else You’re sitting in a cafe, open your iPad, connect to your full desktop, see what the agents are up to, give one a nudge, close the lid and order another coffee A $600 box, a Claude subscription, and the iPad you already own Demo below 👇

Asteri@Asteri_eth

x.com/i/article/2067…

English

2.2K

Evolution Plus AI@evolutionplusai·3h

@RykerStone_ Is the graph staying fresh as the code changes, or is it a snapshot?

English

Ryker Stone@RykerStone_·7h

I just full-indexed the Linux kernel — 28 million lines of code, 75,000 files — into a queryable knowledge graph in 3 minutes. Then I asked it "what calls this function" and got an answer in under 1ms. This is what AI coding agents should have been doing all along 🧵

English

296

Evolution Plus AI@evolutionplusai·3h

@HermesAgentTips Thanks for the reply. I'll take a look at Noshy!

English

Hermes Agent Tips@HermesAgentTips·3h

@evolutionplusai decay + ttl + feedback scores fading old/unused memories automatically plus manual purge/delete when you want it gone outright..

English

Hermes Agent Tips@HermesAgentTips·9h

I built Noshy, an open source memory layer for AI agents and v0.2 just shipped The idea is simple.. every agent conversation generates facts, decisions, and preferences. Noshy extracts them automatically, stores them in SQLite, and injects them back at the start of your next session.. no more explaining your stack from scratch every time you open a terminal what I shipped on v0.2.0 - a python decorator, import @ noshy.remember and every function call with its arguments and return value is stored automatically, no setup, no config - a web dashboard, point your browser at the same port the server runs on and see what your agent remembers, search across sessions, check stats - PyPI support, pip install noshy on any platform with Python 3.10 or higher - semantic memoir search for permanent knowledge, vector embeddings work out of the box with openai or fastembed, no API key required for the local option - auto decay and dedup, old memories fade, duplicates merge, the store stays compact without manual maintenance what do you use for cross session agents memory?

English

Evolution Plus AI@evolutionplusai·3h

@lumpenspace A fresh branch for a one-line CSS fix is the most junior-dev habit it could have inherited. But love is love.

English

mc lumps ⏹️❗️ 🔨⏱️@lumpenspace·7h

oh to be loved by someone the way claude code loves opening branches for stylesheet changes

English

210

Evolution Plus AI@evolutionplusai·3h

@richiemcilroy They want goals to be a north star, not a spec dump. The cap is doing you a favor.

English

Richie@richiemcilroy·7h

hmm, /goals are limited to 4000 characters in both Claude Code, and Codex interesting

English

722

Evolution Plus AI@evolutionplusai·3h

@ignat_en Format was never the hard part. Keeping the context true after the third person edits it is.

English

Ignat@ignat_en·10h

Google just shipped the Open Knowledge Format. It's good, but it skips the hard part. Markdown plus a little YAML frontmatter, shippable as a tarball, readable in any editor, indexable by any tool. If your AI context is trapped in scattered wikis and code comments, OKF gives it a clean, portable shape. I agree with the diagnosis. It's the problem I've been working on for the last few months. Making knowledge portable is an important step forward. But here's what the announcement leaves out. The format is the easy 80%, and Google just gave it a common language. The hard 20% is everything a static file can't represent: 1) Who's allowed to see which fact. A markdown file in a tarball doesn't know the comp doc is off-limits to the intern. A company brain has to enforce that inside the query itself, or the AI will happily quote a source the asker was never allowed to open. 2) When a fact stopped being true. OKF has a timestamp. A real decision has a lifecycle: it took effect in January, got superseded in March, and the thing that replaced it points back. Ask about a year-old decision and you should learn what changed, not get a stale file with a confident date on it. 3) Whether the answer is proven or guessed. A wiki hands an LLM context and hopes. Grounding means every claim is pinned to the exact sentence it came from, with a page and a source you can open. Readable-by-an-LLM is not the same as verifiable. The format makes knowledge portable. That's a really important step forward. But it doesn't make it permissioned, dated, or provable. That's the actual company brain, and that's where the real work is. We're building it at Combra. If this is the problem you're living with, the waitlist is open.

Google Cloud Tech@GoogleCloudTech

Introducing the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wiki pattern into a portable, interoperable format. AI is only as smart as the context we give it. As we build more advanced, agentic AI systems, they need accurate metadata and context to be useful. But in most organizations, that context is locked inside fragmented data catalogs, isolated wikis, scattered code comments, or the minds of senior engineers. Every time a new AI agent is built, teams are forced to solve the exact same context-assembly problem from scratch. To solve this, we've announced OKF, a vendor-neutral, open specification that formalizes the "LLM-wiki pattern" into a portable, interoperable format. It provides a standardized way to represent the enterprise knowledge that modern AI systems rely on. — Just markdown: readable in any editor, renderable on GitHub, indexable by any search tool — Just files: shippable as a tarball, hostable in any git repo, mountable on any filesystem — Just YAML frontmatter: for the small set of structured fields that need to be queryable: type, title, description, resource, tags, and timestamp We’ve also shipped reference implementations to help you hit the ground running, including an enrichment agent for BigQuery, a static HTML visualizer, and live sample bundles on @github → goo.gle/4uGvAEe ➕ Knowledge Catalog can now natively ingest OKF! Stop reinventing data models and building bespoke integrations for every new AI tool. Here's more about how OKF works → goo.gle/4uGvBbg

English

228

Evolution Plus AI@evolutionplusai·3h

@Yuchenj_UW Interesting, will have to take for a spin.

English

Yuchen Jin@Yuchenj_UW·6h

The future of coding is not one agent. It's a whole AI team. Omnigent lets you run a team of agents in one live session: Claude Code, Codex, Cursor, Pi, and your own agents. It is a meta-harness for AI agents, built from our internal Databricks dev tools, and now open-sourced for everyone. Built by the legendary @matei_zaharia and the Databricks AI team. And yes, Matei still writes a lot of code, even the frontend code for Omnigent and our products.

English

171

10.2K

Evolution Plus AI@evolutionplusai·3h

@ConsciousRide Half this list is plumbing, half is judgment. The plumbing you can study. The judgment only shows up after you've shipped something that broke.

English

Akshay Shinde@ConsciousRide·9h

90% of AI Engineering interviews in 2026 come down to these 7 points. 1. LLM Fundamentals: tokenization, transformers & attention, fine-tuning (LoRA/QLoRA), context management, model selection 2. RAG Systems: chunking strategies, embeddings, vector databases, retrieval & reranking, hallucination mitigation 3. Agentic Workflows: tool calling & function calling, ReAct/Plan-Execute patterns, memory & state, multi-agent orchestration 4. Inference Optimization: quantization (AWQ/GGUF), serving engines (vLLM/TGI), batching & KV cache, latency vs cost tradeoffs 5. Evaluation & Observability: LLM-as-judge evals, custom metrics, A/B testing, drift detection, prompt/response logging 6. MLOps Pipelines: experiment tracking, model versioning & registries, CI/CD for AI, data pipelines, deployment automation 7. Production Realities: safety guardrails & prompt injection, scaling inference, cost optimization, debugging failures, compliance & reliability

English

156

7.1K

Evolution Plus AI@evolutionplusai·3h

@mlejva Injecting auth at the sandbox boundary instead of the prompt is the right place for it. The secret the agent never sees is the one it can't leak.

English

Evolution Plus AI@evolutionplusai·3h

@dvassallo Lower friction is what gets missed when building in the vibecoded era.

English

Daniel Vassallo@dvassallo·7h

A few months ago my kids started vibecoding little web games with Cursor and wanted their friends to play them. GitHub Pages was fine until the games needed real backends, so I hacked together a setup where each game was a folder in one repo that deployed to a Hetzner box on every push. That held up until we shipped FULL SEND for Vibe Jam 2026 and it took off with 38,000+ players. The duct tape needed to become something real, so I rebuilt it properly and pulled it out into its own project. It turns one Linux server into a push-to-deploy host for many apps. The whole thing is a single Go binary that installs and drives Docker, Kamal, Cloudflare, Tailscale, and GitHub for you. After that: - Each app is a GitHub repo. - A git push is live in <5 seconds. - Deploys are zero-downtime. - Each app runs in its own container. - Automatic Cloudflare DNS and TLS tunnels. - SQLite-aware backup and restore. It's deliberately single server using convention over configuration, so for a typical app there's no YAML or Dockerfile to write. The idea is that one decent VPS can reliably run all your projects without per-app bills or piles of infra config. It's built on top of Kamal, so it's basically a Kamal wrapper for the "lots of apps on one server" case, with the Cloudflare, Tailscale, DNS, and backup glue wired up by convention. Setup is one interactive command on a fresh Linux box, which walks you through connecting everything. If you also have a bunch of projects you want to run on a single server, tell your Claude Code, Codex, Cursor, or favorite AI agent to grab a VPS and try it for you. It's fully open source and you can customize it to your liking: singleserver.com

English

509

454K

Evolution Plus AI@evolutionplusai·3h

@0xor0ne MCP turning the model into an operator over real tooling is what makes this dangerous in the good way. What are you wiring it into for recon?

English

0xor0ne@0xor0ne·7h

Building autonomous vulnerability hunting lab with Claude Code + MCP blog.zsec.uk/bullyingllms/ #infosec #llm

English

2.4K

Evolution Plus AI@evolutionplusai·3h

@VaibhavSisinty The bottleneck moved from code to how clearly you can write down intent.

English

Vaibhav Sisinty@VaibhavSisinty·7h

Vercel cooked something genuinely special here. 🤯 They open-sourced the exact framework they use to run 100+ AI agents internally. And the way it works changes how you think about building agents. It's called Eve. An agent is a folder. Tools are files. Skills are markdown files. Channels are files. The folder structure IS your agent. One command to start: npx eve@latest init my-agent No plumbing. No boilerplate. Eve handles durable execution, sandboxed compute, human approvals, evals, tracing, and deployment all built in. Add a tool? Drop a TypeScript file. Add a skill? Drop a markdown file. Add Slack? One command. Add a schedule? One more file. Deploy it? vercel deploy. How Vercel already runs on Eve: → Data analyst agent handles 30K+ questions per month in Slack → Sales agent costs $5K/year and returns 32x that → Support agent solves 92% of tickets on its own → 29% of all Vercel deployments now come from agents Their bet: Next.js ended the era of hand-rolling websites. Eve ends the era of hand-rolling agents.

Vercel@vercel

Introducing eve, an agent framework. 𝚊𝚐𝚎𝚗𝚝/ 𝚊𝚐𝚎𝚗𝚝.𝚝𝚜 𝚒𝚗𝚜𝚝𝚛𝚞𝚌𝚝𝚒𝚘𝚗𝚜.𝚖𝚍 𝚝𝚘𝚘𝚕𝚜/ 𝚜𝚔𝚒𝚕𝚕𝚜/ 𝚜𝚊𝚗𝚍𝚋𝚘𝚡/ 𝚜𝚌𝚑𝚎𝚍𝚞𝚕𝚎𝚜/ Like Next.js, for agents. vercel.com/blog/introduci…

English

386

125.2K

Découvrir

@claudeai @suggestionii @wardy_ai @mlcarldev @heyrimsha @Asteri_eth @RykerStone_ @HermesAgentTips