Andrea Luzzardi

874 posts

Andrea Luzzardi

@aluzzardi

Co-founder @MendralHQ. Wrote @Docker’s first lines of code in the earliest days. Co-founded @dagger_io. Ex-Google · Ex-Microsoft.

San Francisco, CA Katılım Ocak 2009

530 Takip Edilen2.7K Takipçiler

Andrea Luzzardi retweetledi

Sam Alba@sam_alba·11h

@davidfowl Yeah coding agents are increasing the CI bottleneck, and we’ll always need an integration loop. But the good news is that LLMs enable automations that were not possible before, agentic platform engineering is now a thing. This is what we’re building @MendralHQ and it works.

English

268

Andrea Luzzardi retweetledi

Mendral@MendralHQ·20 Mar

Shipped this week: → Per-repo branch filtering and severity thresholds → Editable implementation plans before the agent writes code → Similar Insights to catch duplicates across repos → Full session traceability on every agent action search/filter fixes, better CI log parsing, tighter code reviews Mendral is an AI DevOps agent that monitors CI, diagnoses failures, and opens fix PRs. No quarantine. Actual fixes.

English

307

Andrea Luzzardi retweetledi

Sam Alba@sam_alba·27 Şub

yeah! @MendralHQ is on HN 🔥

English

283

Andrea Luzzardi retweetledi

Mendral@MendralHQ·27 Şub

New post: Anatomy of a Production AI Agent How we built Mendral's multi-agent system. Opus for root cause analysis, Sonnet for evidence gathering, Haiku for log parsing. A decade of CI expertise encoded. 16,000+ investigations/month. Firecracker sandboxes on @blaxelAI. Durable execution on @inngest. Here's what makes it work. mendral.com/blog/anatomy-o…

English

Andrea Luzzardi retweetledi

Sam Alba@sam_alba·26 Şub

Andrej is right. AI can write code in 30 minutes that used to take a weekend. But then it sits in CI for hours because a flaky test failed, a build timed out, or some dependency broke in main. The bottleneck isn't writing code anymore. It's shipping it. That's what we're fixing at @MendralHQ

Andrej Karpathy@karpathy

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

English

657

Andrea Luzzardi retweetledi

Mendral@MendralHQ·25 Şub

Most engineering teams have no idea how healthy their CI actually is. How many failures per week? What's your P90 build time? Which 3 issues caused 80% of your failures? We built an agent that answers all of this. Here's what we shipped 👇

English

146

Andrea Luzzardi retweetledi

Mathis@MathisJoffre1·25 Şub

@hahnbeelee This why we have @MendralHQ! Zero downtime, 💯 accurate

English

199

Andrea Luzzardi retweetledi

Sam Alba@sam_alba·19 Şub

Hot take on multi-agent AI: one giant model doing everything is the new monolith. We learned this 10 years ago with Docker. Monolithic apps couldn't scale. Smaller services with clear boundaries was the answer. Same thing is happening with AI agents now.

English

149

Andrea Luzzardi retweetledi

Mendral@MendralHQ·18 Şub

Your CI fails on main. You dig through logs. It's the same flaky test from last week. We built an agent that diagnoses failures, tracks patterns, and opens PRs with fixes. Here's what we shipped this week 👇

English

176

Andrea Luzzardi retweetledi

Mendral@MendralHQ·17 Şub

Just shipped a new demo for our homepage. CI breaks. Agent reads the logs. Opens a fix PR. Done. What would you want to see next?

English

Andrea Luzzardi retweetledi

Mendral@MendralHQ·16 Şub

Flaky tests are the most expensive problem nobody talks about. At 10 engineers, a flaky test is annoying. At 100, it's a tax on everyone's productivity. Here's what we learned scaling Docker's CI from a handful of builds to thousands per day.

English

211

Andrea Luzzardi retweetledi

Sam Alba@sam_alba·15 Şub

Everyone's building AI agent demos. Only a handful are shipping AI agents to production. The gap isn't intelligence. It's context. A model that doesn't know your codebase, your CI history, your team's patterns is just a fancy autocomplete with opinions.

English

174

Andrea Luzzardi retweetledi

Sam Alba@sam_alba·12 Şub

We published a longer version about this: mendral.com/blog/ci-at-sca…

Mendral@MendralHQ

An AI agent just shipped a performance optimization to @PostHog's public repo. Not a suggestion. Not a report. A merged pull request that shards Playwright E2E tests into 4 parallel jobs. Before: ~11 min retries. After: ~3 min. Here's what it looks like 👇

English

367

Andrea Luzzardi retweetledi

Mendral@MendralHQ·10 Şub

Your CI breaks. An engineer drops everything to debug it. 30 minutes later they find a flaky test. We built an agent that does this in seconds. Here's what we shipped this week 🧵

English

173

Andrea Luzzardi retweetledi

Mendral@MendralHQ·10 Şub

Why this matters. PostHog runs 575,894 CI jobs/week. 33.4M test executions. 1.18B log lines. Every commit to main triggers 221 parallel jobs. 65 commits merged daily. 98 engineers, one monorepo. At that scale, even 99.98% test pass rate generates real noise.

English

142

Andrea Luzzardi retweetledi

Mendral@MendralHQ·10 Şub

English

2.8K

Andrea Luzzardi@aluzzardi·5 Şub

Twelve years later: shinier tools, longer YAML, same problems. I don't think CI is solved. I think we just stopped expecting it to be.

English

Andrea Luzzardi@aluzzardi·5 Şub

Then we managed to test Docker by running it in Docker. We called it hack/dind. Worked for six months before flaky tests and slow builds took over. Then people started ignoring failures because "that one always fails."

English

101

Andrea Luzzardi@aluzzardi·5 Şub

I set up Docker's first CI in 2014. Travis couldn't run Docker so the config literally just deleted test files to avoid failures. A glorified linter.

English

139

Andrea Luzzardi retweetledi

Y Combinator@ycombinator·28 Oca

.@MendralHQ keeps your CI green and fast: it finds why builds fail, spots flaky tests, and opens PRs with fixes. Already used by over a dozen teams, including PostHog. Congrats on the launch @sam_alba and @aluzzardi! ycombinator.com/launches/PHN-m…

English

10K

Keşfet

@davidfowl @MendralHQ @blaxelAI @inngest @hahnbeelee @posthog @sam_alba @elonmusk