
TestSprite
640 posts

TestSprite
@Test_Sprite
Built for modern coders: AI agent that tests, fixes, and validates software.


Am I the only one getting vibe coding fatigue? Building landing pages in 30 seconds was fun, but maintaining a complex codebase where half the logic was “vibed” into existence is an absolute headache. Feels like we traded 1 hour of typing for 5 hours of architectural debugging later. I’ve started manually writing core logic again so I actually know where the technical debt is hiding. Is anyone successfully managing large production projects with AI agents, or are we all just building disposable software?

Imagine deploying 1,000,000 lines of code written in 6 days by AI that no human has ever read, let alone reviewed, to production where your customer’s data is. Imagine


Coding agents are accelerating different types of software work to different degrees. When we architect teams, understanding these distinctions helps us to have realistic expectations. Listing functions from most accelerated to least, my order is: frontend development, backend, infrastructure, and research. Frontend development — say, building a web page to serve descriptions of products for an ecommerce site — is dramatically sped up because coding agents are fluent in popular frontend languages like TypeScript and JavaScript and frameworks like React and Angular. Additionally, by examining what they have built by operating a web browser, coding agents are now very good at closing the loop and iterating on their own implementations. Granted, LLMs today are still weak at visual design, but given a design (or if a polished design isn’t important), the implementation is fast! Backend development — say, building APIs to respond to queries requesting product data — is harder. It takes more work by human developers to steer modern models to think through corner cases that might lead to subtle bugs or security flaws. Further, a backend bug can lead to non-intuitive downstream effects like a corrupted database that occasionally returns incorrect results, which can be harder to debug than a typical frontend bug. Finally, although database migrations can be easier with coding agents, they’re still hard and need to be handled carefully to prevent data loss. While backend development is much faster with coding agents, they accelerate it less, and skilled developers still design and implement far better backends than inexperienced ones who use coding agents. Infrastructure. Agents are even less effective in tasks like scaling an ecommerce site to 10K active uses while maintaining 99.99% reliability. LLMs' knowledge is still relatively limited with respect to infrastructure and the complex tradeoffs good engineers must make, so I rarely trust them for critical infra decisions. Building good infrastructure often requires a period of testing and experimentation, and coding agents can help with that, but ultimately that’s a significant bottleneck where fast AI coding does not help much. Lastly, finding infrastructure bugs — say, a subtle network misconfiguration — can be incredibly difficult and requires deep engineering expertise. Thus, I’ve found that coding agents accelerate critical infrastructure even less than backend development. Research. Coding agents accelerate research work even less. Research involves thinking through new ideas, formulating hypotheses, running experiments, interpreting them to potentially modify the hypotheses, and iterating until we reach conclusions. Coding agents can speed up the pace at which we can write research code. (I also use coding agents to help me orchestrate and keep track of experiments, which makes it easier for a single researcher to manage more experiments.) But there is a lot of work in research other than coding, and today’s agents help with research only marginally. Categorizing software work into frontend, backend, infra, and research is an extreme simplification, but having a simple mental model for how much different tasks have sped up has been useful for how I organize software teams. For example, I now ask front-end teams to implement products dramatically faster than a year ago, but my expectations for research teams have not shifted nearly as much. I am fascinated by how to organize software teams to use coding agents to achieve speed, and will keep sharing my findings in future posts. [Original text: deeplearning.ai/the-batch/issu… ]




More and more people are asking me about testing resources so let's put everything I've written in one post. Bookmark, share, and, most importantly, please read these. The True Purpose of Testing epicweb.dev/the-true-purpo… Developers often overlook the fundamentals and rush into writing tests without properly understanding what a test is and what is its function. No test is inherently useful just because it exists. Read this one to learn what makes it useful. The Golden Rule of Assertions epicweb.dev/the-golden-rul… There's a lot of debate over what makes a good test. In this one, I'm defining a short and objective way to grade a test's quality no matter the language or the tested system. This is, without a tinge of exaggeration, a game-changer in how you approach your tests. Anatomy of a Test epicweb.dev/anatomy-of-a-t… Let's talk about the building blocks that make up any automated test. From JavaScript to Go and Rust—these blocks power tests everywhere. Know your blocks. Implicit Assertions epicweb.dev/implicit-asser… Did you know there's a way to express expectations in tests without writing "expect"? Those are called implicit assertions and they are tremendously powerful because they help you express more by writing less. Inverse Assertions epicweb.dev/inverse-assert… Sometimes you need to assert that something did not happen. That can be tricky, especially if that something is asynchronous. The last thing you want are false positives. What you actually want is inverse assertions. Making Use of Code Coverage epicweb.dev/making-use-of-… Code coverage has been an ongoing debate in the engineering circles. Is 100% code coverage in tests good? Bad? When should you strive for it? Why do people say it's harmful? I'm answering all those questions in this one and giving you practical tips on when to use (and not to use) code coverage. Good Code, Testable Code. epicweb.dev/good-code-test… You've gathered by now that some code is easier to test than the other. But why? Let's take a look at the characteristic of code's testability, what defines it, what is its relationship with complexity, and how to make your code more testable. What is a Test Boundary? epicweb.dev/what-is-a-test… Automated tests rarely involve your entire system (yes, even the end-to-end ones have exceptions). There's often a place where you draw the line. The boundary. Learn what it is and how to use test boundaries efficiently to focus on the exact behaviors you want to test. Be S.M.A.R.T. About Flaky Tests epicweb.dev/be-smart-about… Flakiness is the scourge of reliability. If you've written a test before, you likely had experience with flakiness. But what is it at its core and what causes it? And how should you deal with flakiness? Writing Tests That Fail epicweb.dev/writing-tests-… You write tests for them to fail. We all enjoy a green CI, but the true value of tests is when they fail. What matters is when and how they fail.

AI slop is good, actually. Slop is what enables fast parallel experimentation. The etiquette and skill is understanding the boundaries of where slop exists and the extent to which it should be cleaned up and how. A few examples: I’m working on the internals of some system right now. The API and GUI of this thing is fully zero shame slop. It’s horrible. But it lets me focus on the core quality while shipping a usable piece of alpha quality software to testers (transparent about the slop frontend). Similarly, this system has plugins. We sent agents in Ralph loops overnight to generate dozens of plugins. The plugins are slop. The quality is bad. The plugin API/SDK is absolutely not done. But we can test a full GUI with a full plugin ecosystem. When we change the API, we can regenerate them all. The cost of change is just tokens, the velocity is incomparable to before. I built Terraform. We tested and shipped TF 0.1 with about 3 very weak providers. Because we ran out of time. Building was slow. And when we changed our SDK the cost was immense. Totally different today, 10 years later. Today, I would’ve slop generated 100 providers (again, with transparency and cleanup later, but just to prove it out). As an anti example, I would not PR this (without prior warning) to another project. I would not throw this onto customers without full review or transparency (as I’m already doing). I would not accept first pass slop. It’s almost never right. Slop is a tool. And like anything else it’s not blanket bad or good. The context is everything.

TestSprite: Interview With Co-Founder & CEO Yunhao Jiao About The Autonomous AI Testing Agent: TestSprite provides an autonomous AI testing agent that automatically generates, executes, and maintains end-to-end frontend and backend tests to validate… dlvr.it/TSNTVp




Coding agents are accelerating different types of software work to different degrees. When we architect teams, understanding these distinctions helps us to have realistic expectations. Listing functions from most accelerated to least, my order is: frontend development, backend, infrastructure, and research. Frontend development — say, building a web page to serve descriptions of products for an ecommerce site — is dramatically sped up because coding agents are fluent in popular frontend languages like TypeScript and JavaScript and frameworks like React and Angular. Additionally, by examining what they have built by operating a web browser, coding agents are now very good at closing the loop and iterating on their own implementations. Granted, LLMs today are still weak at visual design, but given a design (or if a polished design isn’t important), the implementation is fast! Backend development — say, building APIs to respond to queries requesting product data — is harder. It takes more work by human developers to steer modern models to think through corner cases that might lead to subtle bugs or security flaws. Further, a backend bug can lead to non-intuitive downstream effects like a corrupted database that occasionally returns incorrect results, which can be harder to debug than a typical frontend bug. Finally, although database migrations can be easier with coding agents, they’re still hard and need to be handled carefully to prevent data loss. While backend development is much faster with coding agents, they accelerate it less, and skilled developers still design and implement far better backends than inexperienced ones who use coding agents. Infrastructure. Agents are even less effective in tasks like scaling an ecommerce site to 10K active uses while maintaining 99.99% reliability. LLMs' knowledge is still relatively limited with respect to infrastructure and the complex tradeoffs good engineers must make, so I rarely trust them for critical infra decisions. Building good infrastructure often requires a period of testing and experimentation, and coding agents can help with that, but ultimately that’s a significant bottleneck where fast AI coding does not help much. Lastly, finding infrastructure bugs — say, a subtle network misconfiguration — can be incredibly difficult and requires deep engineering expertise. Thus, I’ve found that coding agents accelerate critical infrastructure even less than backend development. Research. Coding agents accelerate research work even less. Research involves thinking through new ideas, formulating hypotheses, running experiments, interpreting them to potentially modify the hypotheses, and iterating until we reach conclusions. Coding agents can speed up the pace at which we can write research code. (I also use coding agents to help me orchestrate and keep track of experiments, which makes it easier for a single researcher to manage more experiments.) But there is a lot of work in research other than coding, and today’s agents help with research only marginally. Categorizing software work into frontend, backend, infra, and research is an extreme simplification, but having a simple mental model for how much different tasks have sped up has been useful for how I organize software teams. For example, I now ask front-end teams to implement products dramatically faster than a year ago, but my expectations for research teams have not shifted nearly as much. I am fascinated by how to organize software teams to use coding agents to achieve speed, and will keep sharing my findings in future posts. [Original text: deeplearning.ai/the-batch/issu… ]

Vibe coding bizi bitirdi… Artık proje düşünmeden, bir şeyler yapmadan duramıyoruz. AI boş kaldıkça içimiz sıkılıyor; “şunu da mı yapsak, bunu da mı denesek?” diye sürekli yeni fikir çıkıyor Yeni videoda biraz bu dönemin gerçeklerini konuştum: youtu.be/SKs1vSIXZ9o

they shipped their public launch today. their investor data room has been a public URL the whole time. 42K users, 78.9% D7 retention, $1 CAC. the numbers are typed in. and i can prove it. 🧵 not-so-serieous.vercel.app

We’re talking about Goblins. openai.com/index/where-th…

@karpathy and I are back! At @sequoia AI Ascent 2026. And a lot has changed. Last year, he coined “vibe coding”. This year, he’s never felt more behind as a programmer. The big shift: vibe coding raised the floor. Agentic engineering raises the ceiling. We talk about what it means to build seriously in the agent era. Not just moving faster. Building new things, with new tools, while preserving the parts that still require human taste, judgment, and understanding.

